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FOREWORD 


Dear  Workshop  Participant 

Welcome  to  the  Second  Workshop  in  Defense  Applications  of  Signal  Processing  and  to 
Starved  Rock  Resort.  In  the  tradition  of  its  predecessor,  held  at  Victor  Harbor,  South 
Australia  in  June  1997,  the  objective  is  to  gather  world-class  researchers  for  stimulating  and 
thought  provoking  discussion  on  signal  processing  as  it  relates  to  defense  applications.  While 
DASP  97  succeeded  in  bringing  together  about  55  researchers  from  both  defense  laboratories 
and  umversities  in  the  USA  and  Australia,  on  this  occasion,  there  are  about  60  participants,  25 
from  Australia  and  35  from  the  USA,  again  with  a  mix  of  lab  people  and  academics. 

The  motivation  for  this  second  workshop  was  developed  at  the  first  one.  The  venue,  time  of, 
and  kind  of  conference  has  changed  many  times  in  the  intervening  two  years,  and,  at  times,  it 
has  seemed  doomed  not  to  happen.  That  it  has  is  largely  due  to  the  efforts  of  a  few  people 
who  have  worked  hard  over  the  last  few  months  to  bring  us  all  together  there  at  this  time. 
They  are: 

Mark  Smith  (Georgia  Tech),  Bill  Moran  (Flinders  Uni  and  CSSIP),  Alan  Lindsey  (Rome 
Lab),  Jim  Sclnoeder  (CSSIP),  Lang  White  (Adelaide  Uni  and  CSSIP),  Major  Michele 
Gaudreault  (AOARD),  Marian  Viola  (DSTO),  Jon  Sjogren  (AFOSR) 

We  also  wish  to  pay  tribute  to  the  funding  bodies  for  this  Workshop.  They  are: 

The  Electronic  and  Surveillance  Laboratories  of  DSTO, 

US  Defense  Advanced  Research  Projects  Agency  (DARPA), 

US  Air  Force  Office  of  Scientific  Research  (AFOSR)  and 
Asian  Office  of  Aerospace  Research  and  Development  (AOARD). 

We  wish  to  thank  them  all  for  their  generosity  in  making  this  event  possible. 

The  workshop  will  open  with  a  welcome  reception  on  Simday  evening.  During  the  course  of 
the  workshop  registrants  may  wish  to  engage  in  tennis  or  swimming,  go  horseback  riding  or 
hiking,  or  participate  in  any  of  the  other  recreational  activities  available  at  the  Park.  The 
formal  program  will  conclude  with  lunch  on  Thursday.  A  field  trip  to  Chicago  for  our 
Australian  guests,  complete  with  a  boat  tour  of  the  city,  shopping  at  the  Navy  Pier  (ride  the 
Ferris  wheel!)  and  then  Chicago-style  pizza  is  planned  for  Hiursday  afternoon  and  evening. 

Our  aim,  as  last  time,  is  to  foster  collaboration  among  the  various  communities  represented 
here  and,  in  particular,  between  the  US  and  Australian  researchers  in  defense  signal 
processing.  To  this  end,  we  have  put  together  a  formal  program  of  lectures  and  poster 
sessions,  together  with  several  more  informal  events.  We  hope  that  you  will  work  and  play 
together  over  the  next  few  days.  Attend  the  talks  and  poster  sessions,  but  above  all  interact 
with  participants  other  than  the  ones  you  see  regularly.  Make  sure  that,  when  you  leave  here, 
you  have  found  at  least  one  person  from  the  other  side  of  the  Pacific  with  whom  you  have 
agreed  to  "Keep  in  touch"  or  "Exchange  ideas"  or  better  still  "Collaborate".  By  doing  so  you 
help  to  make  this  workshop  a  success. 
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Nonuniform  Linear  Antenna  Array  Design  and  Signal 
Processing  for  DOA  Estimation  of  Gaussian  Sources 

Yuri  I.  Abramovich^  and  Nicholas  K.  Spencer 
Cooperative  Research  Centre  for  Sensor  Signal  and  Information  Processing  ( CSSIP), 
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El-mail;  yuriecssip. edu. au and  nspencerecssip.edu.au 


This  paper  discusses  the  problem  of  direction-of-arrival  (DOA)  estima¬ 
tion  for  Gaussian  sources  that  are  arbitrairily  correlated  —  from  indepen¬ 
dent  to  fully  correlated.  For  independent  sources,  the  antenna  array  design 
is  governed  by  two  competing  considerations;  maximum  aperture,  that  in- 
chnes  towcU-ds  increasing  sparsity  for  a  given  number  of  array  sensors,  and 
identifiability,  that  tends  to  exclude  extreme  sparsity.  For  fuUy  correlated 
sources,  these  two  competing  criteria  are  augmented  by  a  third  which  allows 
for  the  initicdisation  of  DOA  estimation  by  the  generalised  spatial  smooth¬ 
ing  (GSS)  technique.  The  maximum  number  of  fuUy  correlated  sources  is 
in  turn  em  important  factor  in  the  GSS  cJgorithm  and  subsequent  array  ge¬ 
ometry  design.  We  present  a  geometry  optimisation  technique  that  permits 
accurate  DOA  estimation  of  arbitrarily  correlated  sources. 


Key  Words:  sparse  linear  arrays,  direction-of-arrival  estimation,  multimode,  independent 
sources 


1.  INTRODUCTION 

In  many  direction-finding  applications,  the  number  of  antenna  elements  available 
for  the  construction  of  an  array  is  limited,  in  which  case  the  problem  of  optimum 
antenna  geometry  for  a  fixed  number  of  elements  M  naturally  arises.  For  linear 
arrays,  solutions  to  this  problem  belong  to  the  class  of  nonuniformly-spaced  linear 
arrays  (NLA’s),  also  known  as  sparse  or  aperiodic  arrays,  and  several  different 
approaches  currently  exist  which  seek  the  “best”  design. 

Meanwhile,  speculations  regarding  optimum  sparse  geometry  have  mostly  been 
made  for  the  independent  (Gaussian)  source  model.  In  particular,  the  well-known 
suggestion  that  the  minimum-redundancy  criterion  is  appropriate  for  (integer)  NLA 
geometry  optimisation  [1]  is  based  on  the  simple  fact  that  such  geometries  generate 
a  contiguous  (“gapless”)  set  of  spatial  covariance  lags.  For  independent  Gaussian 
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sources,  this  property  immediately  allows  for  the  unambiguous  estimation  of  up  to 

m<^M{M-l)  (1) 

DOA’s  by  the  direct  augmentation  approach  (DAA)  of  Pillai  et  al.  [2]. 

On  the  other  hand,  it  has  been  demonstrated  in  [3]  that  for  arbitrary  and  in 
particular  fully  correlated  Gaussian  sources,  manifold  ambiguity  leads  to  non- 
identifiability.  Furthermore,  it  has  been  demonstrated  by  Proukakis  and  Manikas 
[4]  that  scenarios  with  an  ambiguous  manifold  (i.e.  with  linearly  dependent  manifold 
“steering”  vectors)  always  exist  for  sparse  arrays. 

This  property  obviously  means  that  the  number  of  identifiable  arbitrary  (fully) 
correlated  sources  in  sparse  antenna  arrays  is  always  significantly  less  than  the 
number  of  antenna  elements.  Even  for  uniform  il4-element  antenna  arrays,  the 
traditional  spatial  smoothing  technique  [5]  allows  for  unambiguous  DOA  estimation 
of  up  to  fully  correlated  sources. 

For  sparse  antenna  arrays,  the  spatial  smoothing  technique  is  not  directly  appli¬ 
cable.  Consequently,  some  provisions  are  necessary  to  enable  initialisation  of  the 
DOA  estimation  procedure  at  the  very  least.  Obviously,  the  number  of  identifiable 
(and  initialised)  DOA’s  should  be  predefined  and  limited  to 


m  <C 


M 


(2) 


Thus,  when  non-uniform  antenna  geometry  is  optimised  for  DOA  estimation  of 
arbitrarily  correlated  sources,  the  following  (competing)  provisions  are  to  be  made: 

1.  A  certain  number  of  repetitive  sub-arrays  (partial  arrays)  are  embedded  into 
the  original  geometry,  which  allows  for  generalised  spatial  smoothing  for  a  small  pre¬ 
defined  number  of  fully  correlated  sources.  This  allowance  excludes  non-redundant 
geometries  and  even  those  with  the  minimum  redundancy. 

2.  Given  some  number  of  redundancies,  we  have  to  achieve: 

(i)  A  fully  augmentable  geometry  to  provide  identifiability  for  the  maximal 
number  of  uncorrelated  (Gaussian)  sources. 

(ii)  Proper  dimensionality  of  the  (co-array)  manifold  of  the  co-array  for  the 
synthetic  partial  arrays  so  that  the  predefined  number  of  uncorrelated  (Gaussian) 
sources,  m,  is  exceeded. 

(iii)  The  maximal  total  aperture  as  well  as  the  maximal  aperture  for  the  co- 
array  of  the  synthetic  partial  array  to  enhance  DOA  estimation  accuracy  for  the 
independent  and  fully  correlated  sources  accordingly. 


2.  PARTIAL  ARRAYS 

Suppose  that  we  have  M  antenna  sensors  and  wish  to  construct  the  “optimal” 
NLA.  Further  suppose  that  the  sensor  positions  d  =  [0,  ^2,  ds,  . . . ,  cIm]  are  re¬ 
stricted  to  integer  values  (usually  measured  in  half- wavelength  units) .  The  meaning 
of  “optimal”  needs  to  be  defined  carefully  from  the  perspective  of  the  discussion 
above. 
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We  consider  the  DOA  estimation  problem  for  some  small  pre-specified  maximum 
number  of  coherent  signals  m  (of  arbitrary  configuration)  using  the  special  class  of 
partial-array  NLA  geometries  and  the  corresponding  generalised  spatial  smoothing 
(GSS)  algorithm. 

Let  the  co-sequence  of  an  array  d  be  its  set  of  M  —  1  consecutive  intersensor 
separations  {ie.  differences),  while  its  co-array  is  the  sorted  set  of  M{M  —  l)/2 
differences.  We  define  a  partial  array  to  be  a  group  of  nonuniform  linear  non¬ 
contiguous  sub-arrays  of  identical  co-sequence  structure  [6].  Associated  with  each 
partial  array  are  its  multiplicity  k  (number  of  occurrences  or  instances) ,  order  I 
(number  of  co-sequence  elements  involved),  and  aperture  a.  A  given  NLA  will  have 
n  embedded  partial  arrays,  with  a  total  of  N  instances.  The  GSS  technique  may  be 
applied  to  a  NLA  providing  it  yields  at  least  one  partial  array  of  multiplicity  k  >  m 
and  order  £  >  m,  where  m  is  the  number  of  fully  correlated  signals.  Examples  of 
partial  arrays  and  their  properties  are  more  fully  discussed  in  [7]. 

The  GSS  algorithm  introduced  in  [6]  consists  of  an  initialisation  step  followed  by 
local  ML  refinement.  The  initialisation  step  is  based  on  the  PA-MUSIC  approach 
involving  all  appropriate  partial  arrays. 

Suppose  that  an  NLA  yields  a  total  of  N  partial  arrays,  each  of  multiplicity 
order  U  and  aperture  a,-  (i  =  1,...,7V).  Let  be  a  (4' -h l)-variate  snapshot 
vector  corresponding  to  the  instance  [j  =  1, .  - . ,  Ki)  of  the  partial  array.  If 
any  instance  of  a  partial  array  occurs  as  a  mirror-image  [ie.  in  reverse  order),  then 
the  corresponding  snapshot  vector  is  observed  by  reversing  the  order  of  antenna 
samples  and  taking  the  complex  conjugate  of  the  vector.  Thus  for  each  partial 
array  we  may  define  the  (4  +  1)  x  (4  +  1)  partial  array  covariance  matrix  by  spatial 
smoothing  to  be 


fU  =  (3) 

J=i 

Let  Gj-  be  the  noise  eigen-subspace  of  P, ,  then  Gi  consists  of  at  least  one  eigenvector 
(since  m  M).  The  PA-MUSIC  technique  is: 

N 

find  ma.x  fpA{0)  :=  min af  [6)  Gj  Gf  ai (ff)  (4) 

ff  $  *  ^ 

i-\ 

where  a,(0)  is  the  (4+l)-variate  manifold  (“steering”)  vector  which  corresponds  to 
the  given  partial  array  geometry.  Evidently  this  approach  eliminates  non-coinciding 
ambiguities.  More  specifically,  the  co-array  of  the  synthetic  partial  array  that  is 
constructed  by  all  of  the  properly  averaged  covariance  lags  produced  by  all  of  the 
partial  arrays,  should  have  a  manifold  dimensionality  that  exceeds  the  pre-defined 
number  of  fully  correlated  sources.  Thus  the  effectiveness  of  DOA  estimation  de¬ 
livered  by  GSS  is  directly  related  to  the  number,  variety  and  /t^a-properties  of  the 
available  partial  arrays.  For  this  reason,  the  sum  Y^=i  o,j  =  A  could  be  treated 
as  a  cost  function  for  antenna  geometry  optimisation.  Details  of  the  three-stage 
optimisation  approach  appear  in  [7] . 
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3.  EXAMPLE  RESULTS 

The  following  example  illustrates  array  geometry  optimisation  results  for  an  M  = 
16  element  array.  The  initial  choice  Mi  =  10  gives  us  the  starting-point  10-element 
non-redundant  Sverdlik  array  [8] 

df^  =  [0, 1, 6, 10, 23, 26, 34, 41, 53, 55] .  (5) 

The  exhaustive  tree  search  of  stage  two  yields  37  candidate  gap-free  geometries,  each 
with  14  elements  and  36  redundancies.  The  integer  programming  maximisation  of 
stage  three  finds  that  of  these  candidates,  one  in  particular  is  the  best  (in  the  search 
range  l  —  Z  and  c  G  [1, 18]),  since  with  the  addition  of  two  sensors  (8,19)  it  yields 
the  16-element  NLA 

dss  =  [0, 1, 5, 6,  |,  10,  m  23, 26, 34, 31, 41,  53, 55]  (6) 

having  the  maximal  cost  function  A  =  38467  (and  65  redundancies).  Thus  we  have 
partitioned  our  M  =  16  elements  in  this  example  by  {Mi  =  10,  M2  =  4,  M3  =  2}. 
Note  that  this  three-stage  optimisation  search  took  a  few  days  computing  time  on 
a  modern  workstation,  even  with  its  rather  modest  search  range.  At  this  point, 
we  have  no  alternative  but  to  assume  that  any  NLA  rich  in  partial  arrays  for  a 
restricted  search  set  {£,  c}  will  be  similarly  superior  for  a  more  expansive  set. 

Indeed,  Table  1  shows  the  ^^-distribution  and  Fig.  1  illustrates  the  a-distribution 
of  partial  arrays  for  dss  for  the  expanded  search  range  £  G  [3,5]  and  c  G  [1,30], 
whence  we  find  A  —  99441.  This  array  performs  better  than  the  16-element  XJLA 
because  of  the  large  numbers  of  embedded  partial  arrays,  each  of  significant  aper¬ 
ture.  The  minimum-redundancy  array  of  comparable  total  aperture  (Mq,  =  58) 
has  13  elements  ,  so  we  could  consider  ^55  to  be  a  type  of  “optimal”  solution 
by  the  introduction  of  only  three  additional  elements  to  the  minimum-redundancy 
structure. 

TABLE  1 

Partial  array  distribution  by  multiplicity  (k)  and  order  (£)  for  dss 
for  m  —  3  and  the  search  range  t  G  [3,  5]  and  c  G  [1,  30]. 
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4.  PRACTICAL  APPLICATION:  FREE-CHANNEL  ADVICE  FOR 
HIGH-FREQUENCY  SURFACE- WAVE  RADARS 

One  of  the  areas  for  practical  applications  of  non-uniform  linear  array  methodol¬ 
ogy  is  in  dear-channel  advice  for  HF  over-the-horizon  radars,  specifically  for  surface- 
wave  radars.  In  most  realistic  situations,  selection  of  the  operational  frequency  for 
such  radars  should  be  performed  with  respect  to  the  the  directional  spectrum  of 
the  external  noise.  Accordingly,  dear-channel  advice  sub-systems  should  incorpo¬ 
rate  DOA  estimation  capabilities.  Current  HF  surface-wave  radars  usually  exploit 


4 


ARRAY  DESIGN  AND  SIGNAL  PROCESSING  FOR  DOA  ESTIMATION 


5 


aperture  a 


FIG.  1.  Aperture  histogram  of  partial  arrays  embedded  in  dss- 


uniform  linear  arrays  with  a  “digital  receiver  per  element”  architecture  in  its  main 
target  detection  module.  It  is,  however,  too  expensive  to  duplicate  this  architec¬ 
ture  for  the  dear-channel  advice  module.  The  number  of  digital  receivers  for  such 
a  module  is  limited  and  should  be  used  with  the  maximal  external  noise  DOA 
estimation  efficiency. 

Figure  2  illustrates  the  practical  results  on  external  noise  DOA  estimation,  ob¬ 
tained  from  the  experimental  surface-wave  over-the-horizon  facility  located  in  North¬ 
ern  Australia  [9].  It  depicts  the  MUSIC  generated  DOA’s  from  a  full  32-element 
uniform  antenna  array  (top)  compared  to  a  similar  set  of  DOA’s  produced  form  a 
16-element  non-uniform  sub-array  (middle)  and  from  a  9-element  sub-array  (bot¬ 
tom).  Both  the  16-  and  9-element  sub-arrays  have  been  selected  from  the  original 
32-element  ULA  within  the  original  aperture  in  a  fully  augmentable  manner. 

The  presented  figure  illustrate  DOA’s  of  multiple  external  noise  sources  as  a 
function  of  the  repetition  period  (“sweep”)  number.  Direct  comparison  of  this  data 
makes  it  clear  that  all  three  sets  of  DOAs  are  operationally  identical  for  this  typical 
frequency. 


5.  SUMMARY 

We  have  considered  a  problem  involving  nonuniformly  spaced  linear  array  geom¬ 
etry  optimisation,  in  the  context  of  enhancing  the  performance  of  modern  super¬ 
resolution  techniques  in  spatial  spectrum  (DOA)  estimation.  This  optimisation 
problem  has  been  reduced  to  a  simplified  form  where  effective  techniques  can  be 
applied,  based  on  dynamic  programming  principles. 

Optimisation  efficiency  in  terms  of  spectral  (DOA)  estimation  accuracy  has  been 
analysed  elsewhere  [6] ,  and  in  most  cases  is  found  to  be  very  high  and  significantly 
superior  to  the  conventional  ULA  geometry  coupled  with  standard  MUSIC-type 
routines. 

Real-data  processing  involving  external  HF  interference  DOA  estimation  justifies 
the  choice  of  NLA  geometry  for  the  frequency  management  subsystem  of  modern 
HF  OTH  radars. 
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FIG.  2.  Effect  of  interference  DOA  estimation  by  thinning  a  linear  array:  (upper)  full  array 
M  =  32,  (middle)  partially  thinned  M  =  16,  (lower)  highly  thinned  M  =  9. 
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Abstract  -  This  paper  presents  measures  of  performance  of  simplex  signaling  in  a  circular  trellis-coded  modulation 
(CTCM)  scheme.  Background  is  given  on  both  CTCM  and  simplex  signaling.  The  CTCM  system  is  shown  to  give 
substantial  coding  gain  when  compared  to  conventional  BPSK.  Performance  is  also  shown  to  improve  as  trellis  size 
increases. 


I.  Introduction 

Trellis-coded  modulation  (TCM)  provides  a  means  by  which  band-limited  channels  can  reap  the  benefits  of  error  control 
coding  by  combining  coding  and  modulation  into  a  single  step.  This  coding/modulation  step  is  accomplished  by  integrating  a 
multi-level  or  multi-phase  signaling  constellation  with  a  state-oriented  encoder,  such  as  a  convolutional  encoder.  Where  a 
binary  block  code  might  use  a  binary  phase-shift  keyed  (BPSK)  modulator  to  transmit  each  of  the  coded  bits  individually,  a 
TCM  scheme  would  choose  one  signal  from  the  constellation  to  represent  a  number  of  the  coded  bits  at  one  time.  In  TCM  the 
tradeoff  is  decoder  complexity  for  coding  gain. 

Recently,  circular  trellis-coded  modulation  (CTCM)  has  been  proposed  [2-10].  Also  known  as  high-dimensional  trellis¬ 
coded  modulation  (HDTCM),  CTCM  takes  the  basic  concepts  of  TCM  (such  as  signal  partitioning)  and  applies  them  to  achieve 
coding  gain  on  a  power-limited  channel,  such  as  a  spread-spectrum  channel,  by  using  a  high-dimensional  signaling 
constellation.  A  CTCM  system  can  be  viewed  as  a  block  code  with  trellis  structure  in  the  sense  that  source  data  is  encoded 
block-by-block  independently.  Additionally,  CTCM  satisfies  a  so-called  “state  constraint,”  which  specifies  that  the  starting 
state  of  a  particular  source  data  block  must  equal  the  ending  state.  This  property  alleviates  the  need  to  set  tail  information  bits 
to  zero  to  drive  the  encoder  to  the  “all-zero”  state,  a  necessary  procedure  in  conventional  TCM. 

In  [10],  the  assignment  of  transmission  symbols  from  a  simplex  signaling  constellation  is  investigated  for  use  in  CTCM, 
where  a  source  alphabet  size  of  four  is  emphasized.  This  paper  takes  these  symbol  assignments  and  determines  performance 
criteria  in  the  form  of  distance  distributions  and  bounds  on  the  bit-error  performance  of  the  system.  It  will  be  shown  that  the 
CTCM  system  provides  substantial  coding  gain  when  compared  to  BPSK  modulation. 

II.  Background 

A.  Simplex  Signaling 

For  arbitrary  dimensions  N  >M  -I ,  M  simplex  signals  exist  for  unit  pulse  energy  if  there  exist  M  signals  of 
5  =  [  S] ,  S2 ,  ■  •  • ,  ■Sw  }  in  W  dimensions  such  that: 

|i,  -  Jy|  =  -jlM  i  ^  j,l< i,  j  <M  (1) 

where  =  ] ,  and  s^j^  e  [-1,0,1],  1<  i  <  M,  l<  k  <  N.  A  simplex  signal  constellation  can  be  loosely  defined  as  a 

set  of  signals  that  are  equidistant  from  each  other  and  energy  equivalent  [11].  Simplex  signals  are  not  orthogonal,  but  they 
achieve  the  same  error  probability  as  an  equally  likely  orthogonal  signaling  set  while  using  less  energy.  Hence,  simplex 
signaling  is  employed  when  transmission  energy  is  constrained. 

The  three-dimensional  simplex  will  be  emphasized  in  this  paper.  It  is  desired  to  create  simplex  signals  utilizing  number  of 
dimensions  (usually  much)  greater  than  three.  For  example,  one  three-dimensional  simplex  signal  might  be 

i  =  [l,  0,-1, 0,-1, 0,0,0]  (2) 

In  this  case,  eight  dimensions  exist  and  dimensions  1,  3,  and  5  are  occupied.  Additionally,  shorthand  notation  can  be  used  to 
describe  simplex  signals.  Equation  (2)  can  be  denoted  as  [l,-3,-5],  for  example.  This  notation  indicates  non-zero  pulses  in 
dimensions  one,  three,  and  five,  and  their  associated  polarities.  There  is  only  one  simplex  that  utilizes  only  dimensions  1,  3, 
and  5  and  contains  the  signal  in  (2)  as  a  member.  Using  shorthand  notation,  the  matrix  expression  for  this  simplex  is 

5  =  [(1  3  5);(1  -3  -5);(-l  3  5);(-l  -3  5)f .  (3) 
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Note  that  the  negative  of  (3)  and,  furthermore,  any  simplex,  is  also  a  simplex.  By  using  all  8  dimensions,  many  simplexes  can 
be  formed  that  contain  (2).  E.g.  5  =  [(1  -3  -5);(-l  4  6); (2  3  -6);  (-2  -4  5)]^,  which  is  just  one  of  dozens  of 
simplexes  that  contain  (2)  as  a  member. 

B.  Circular  Trellis-Coded  Modulation 

The  trellis-coded  modulation  scheme  introduced  in  [2,3],  the  so-called  circular  trellis-coded  modulation  (CTCM),  is  the 
backbone  of  this  research.  A  straightforward  way  to  think  of  CTCM  is  as  a  block  code  with  trellis  structure.  That  is,  individual 
source  data  blocks  are  mapped  to  a  particular  path  through  the  trellis  instead  of  an  output  data  block.  The  mapping  of  the 
source  data  blocks  to  trellis  paths  is  one-to-one.  The  CTCM  scheme  is  characterized  by  the  following  four  parameters:  N 
(dimensionality  of  the  transmitted  signal  space),  n  (size  of  the  source  alphabet),  D  (trellis  depth),  and  B  (input  source  symbol 
sequence  length,  or  block  length). 

The  CTCM  system  can  be  described  by  these  four  parameters  in  the  form  (N,n,D,B).  The  number  of  trellis  states  in  CTCM  is 
determined  by  S  =  n^.  Source  alphabet  size  of  n=4  is  emphasized  in  this  paper.  Trellis  depth  refers  to  the  number  of 
transitions  needed  for  a  given  state  to  reach  any  other  state  in  the  trellis.  The  block  size  denotes  the  size  of  the  input  source 
block  as  well  as  the  number  of  transitions  in  a  legal  trellis  path.  The  block  size  must  be  greater  than  or  equal  to  the  trellis  depth 
plus  one  (B>D+1). 

One  drawback  of  conventional  TCM  is  that  the  decoder  must  know  the  starting  state  of  the  encoder  before  transmission. 
This  is  usually  accomplished  by  “padding”  the  source  data  with  additional  zeros  to  force  the  encoder  to  an  all-zero  state  before 
additional  source  data  is  encoded.  CTCM  alleviates  this  problem  by  forcing  the  starting  and  ending  states  of  a  particular  block 
to  be  the  same.  This  property  of  CTCM  is  known  as  the  state  constraint  [2,7,8]. 

The  state  constraint  is  satisfied  through  proper  design  of  the  state  table.  A  state  table  lists,  for  every  state  in  the  trellis,  what 
the  next  state  transition  will  be  for  any  given  input  symbol.  Since  a  4-ary  source  is  emphasized  in  this  paper,  the  state  table  will 
have  a  number  of  rows  equal  to  the  number  of  states  in  the  trellis,  and  four  columns  which  correspond  to  each  of  the  four  source 
symbols.  The  design  of  the  state  table  is  achieved  through  the  use  of  a  Zech  logarithm  table,  and  is  discussed  in  detail  in  [2], 
which  also  presents  computer  code  capable  of  generating  state  tables  for  arbitrary  source  alphabet  size  n  and  trellis  depth  D. 
Only  one  state  table  exists  for  a  given  pair  of  n  and  D.  An  example  of  a  state  table  is  shown  in  Table  I. 

The  transmission  symbol  table  is  a  look-up  table  analogous  to  the  state  table.  Both  the  state  and  transmission  symbol  tables 
contain  entries  for  all  trellis  states  and  all  possible  source  symbols.  However,  where  the  state  table  defines  the  next  state 
transition  given  the  current  state  and  current  input,  the  transmission  symbol  table  describes  what  symbol  is  transmitted  when 
transiting  to  that  next  state.  Given  the  state  table  and  the  transmission  symbol  table,  the  trellis  is  completely  described.  A  trellis 
diagram  can  then  be  constructed  with  all  trellis  states,  their  associated  transitions,  and  the  related  symbols  [2,3]. 

III.  Simplex  Signaling  Performance  For  16-State  Trellis 

With  the  problem  of  symbol  assignment  addressed  in  [4,11],  the  performance  of  CTCM  using  simplex  signals  will  be 
explored  in  this  section.  Specifically,  a  16-state  CTCM  trellis  will  be  emphasized.  Two  methods  will  be  employed  in  the 
performance  analysis:  distance  distributions  and  bounds  on  the  bit  error  rate  of  the  system.  Performance  using  various  block 
sizes  will  be  compared.  The  bounds  on  the  bit  error  rate  will  also  be  compared  to  the  performance  of  BPSK. 

A.  Distance  Distributions 

One  method  of  measuring  the  performance  of  an  CTCM  system  is  known  as  a  “distance  distribution.”  A  distance  distribution 
is  a  histogram-type  distribution  that  is  created  by  measuring  the  distance  from  all  the  legal  paths  in  the  trellis  to  one  reference 
path  [3,5].  In  general,  there  are  m-  paths  with  distance  d.  from  the  reference  path.  The  distance  value  is  usually  zero  and  the 

distance  from  the  reference  path  to  itself  is  zero,  hence  the  value  of  is  one.  The  first  non-  zero  multiplicity  (other  than  m^) 
occurs  at  what  is  called  the  minimum  Euclidean  distance,  which  is  denoted  by  d^^^. 


TABLE  I.  CTCM  State  Table  For  16-State  Trelus  (n=4, 0=2). 
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The  reference  path  is  typically  chosen  to  be  the  “all-zero”  path.  The  all-zero  path  is  the  path  taken  through  the  trellis  for  a 
source  input  of  exclusively  zero  symbols.  For  a  block  size  of  B,  a  group  of  B  consecutive  zero  source  symbols  maps  to  the  all¬ 
zero  path.  Note  that,  for  a  source  alphabet  size  of  n  =  4,  the  zero  source  symbol  corresponds  to  two  zero  bits. 

The  parameter  known  as  refers  to  the  free  Euclidean  distance  of  the  trellis-coding  scheme.  This  parameter  is  very 

important  in  determining  performance  gain  of  any  CTCM  trellis  [2,5].  which  denotes  the  multiplicity  of  codewords  (legal 
trellis  paths)  with  distance  from  the  reference  path,  is  equally  important. 

For  a  particular  trellis,  can  be  defined  as  [1]: 


d  jj-gg  —  min 


ll/2 


(4) 


where  and  denote  two  codewords  and  the  function  d^  denotes  the  squared  Euclidean  distance  between  the  two 
codewords,  d^^^^  can  be  defined  in  a  more  straightforward  manner  as  the  minimum  Euclidean  distance  between  two  legal 

codewords  that  have  the  same  starting  state  and  ending  state  in  an  CTCM  trellis  [2].  For  a  given  trellis  and  simplex  signals  with 
unit  pulse  amplitude,  this  can  be  expressed  as: 

df,,,  =V8(^  +  1)  (5) 


where  D  denotes  the  depth  of  the  trellis.  Additionally,  recall  that  the  minimum  block  size  is  equal  to  the  trellis  depth  plus  one. 

In  terms  of  the  distance  distribution,  d^^^^  is  the  largest  value  d^^^  can  attain  for  a  particular  trellis.  Obviously,  it  is  desired  to 
force  d^.^  to  reach  d^^^.  Additionally,  it  is  desired  to  minimize  the  multiplicity  at  distance  and  it  is  known  that  d^.^ 
dominates  the  error  performance  for  high  signal-to-noise  ratios  [3,5].  Therefore,  reaching  the  value  of  dj^^^  and  minimizing  the 
value  of  for  a  particular  trellis  ensures  optimal  performance. 

As  discussed  in  [4,11],  there  are  24  possible  transmission  symbol  tables  that  can  be  constructed  for  a  CTCM  trellis  with  a 
source  alphabet  size  of  n  =  4.  Using  a  transmitted  signal  space  dimensionality  of  A  =  8  and  10,  all  possible  symbol  assignments 
were  determined  by  using  the  computer  program  in  [4].  Distance  distributions  were  then  calculated  for  each  of  the  24  possible 
symbol  assignments. 

In  [6],  a  12-dimensional  assignment  was  investigated.  Table  II  lists  values  for  and  d^min  for  this  12-dimensional 
assignment,  as  well  as  the  two  distributions  (for  iV  =  8  and  10)  that  reach  dj^^^  discussed  in  [4].  Distributions  for  block  sizes  of 
B  =  1  and  5  =  8  were  not  included  in  [6]. 

Through  investigation  of  Table  II  it  can  be  seen  that  the  12-dimensional  case  reaches  d^^^  at  a  block  size  of  5=6,  which  is 
smaller  than  the  block  sizes  required  for  either  the  8  or  10-dimensional  cases.  However,  when  comparing  the  values  of  for 
the  three  cases,  the  12-dimensional  value  of  is  significantly  higher  than  the  other  two  cases.  It  can  then  be  concluded  that 

the  8-  and  10-dimensional  assignments  achieve  better  performance  than  the  12-dimensional  assignment  while  using  less 
bandwidth.  In  the  8-dimensional  case,  the  bandwidth  reduction  is  1/3. 


A. 


Bit-Error  Performance 


Bit-error  probabilities  cannot  be  computed  for  CTCM  in  a  closed  form.  To  obtain  exact  P^^  curves,  the  system  would  have 

to  be  simulated.  However,  bounds  on  the  bit-error  probability  can  be  computed,  which  will  give  a  rough  estimate  of  system 
performance.  Additionally,  the  actual  curves  will  asymptotically  approach  the  upper  bound  on  P^^  for  high  signal-to-noise 

ratios  [3].  The  upper  bound  on  P^^  was  derived  in  [3]  and  the  resulting  equation  is  shown  in  (7): 


M  a5, 


(6) 

where  5  is  the  block  size,  k  is  the  number  of  bits  per  symbol,  A5y  is  the  number  of  mismatched  bits  in  the  source  sequences 
associated  with  codeword  1  and  codeword  j,  d^j^  is  the  Euclidean  distance  between  codewords  1  and  j,  is  the  signal  to 

noise  ratio,  M  is  the  number  of  legal  codewords,  and  Q{-)  represents  the  Gaussian  Q-function.  Equation  (7)  gives  a  performance 
benchmark  for  the  system.  It  will  be  used  to  determine  system  performance  for  various  block  sizes  and  symbol  assignments. 
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TABLE  n.  Comparison  Of  Minimum  Distance  Values  And  MuLTiPLicmES  For  16-  State  Trelus. 


Dim. 

B  =  3 

B  =  4 

to 

II 

B  =  6 

B  =  7 

fl  =  8 

d^  N  ■ 

'*mm  ^'min 

d^-  N  ■ 

^min  ^'min 

d^  N  ■ 

**  mm  mm 

d'^-  N  ■ 

“mm  ^’mm 

d^-  N  ■ 

**min  ^^min 

d^-  N  ■ 

^min  ^'min 

N  =  S 
V=10 
N=12 

12  4 

12  1 

12  1 

14  4 

14  8 

16  1 

20  16 

20  21 

20  1 

20  6 

22  12 

24  37 

24  28 

24  21 

24  32 

24  24 

Fig.  1  shows  the  best  union  bound  curve  (as  determined  by  testing  all  24  symbol  assignments),  for  block  sizes  of  B  =  3 
through  7,  using  N=S  dimensions  in  signaling.  The  error  probability  improvement  with  increasing  block  size  is  easily  seen 
from  this  plot.  This  is  in  agreement  with  Shannon’s  theorem,  in  that,  the  more  data  that  is  encoded  at  a  time,  the  better  the  error 
performance  will  be. 

Additionally,  as  expected,  the  error  performance  is  asymptotically  approaching  a  limit.  That  is,  the  improvement  in  the  P 

union  bound  in  going  from  P  =  3  to  S  =  4  is  far  greater  than  the  improvement  shown  in  going  from  B  =  6  to  B  =  7.  The 
performance  of  BPSK  is  also  shown  in  the  plot  as  a  reference  point.  For  B  =  7,  the  CTCM  scheme  provides  a  gain  of 
approximately  5  dB  over  BPSK. 

IV.  Simplex  Signaling  Performance  For  64-State  Trellis 

A.  Distance  Distributions 

Distance  distributions  for  the  64-state  trellis  were  calculated  in  the  same  manner  as  the  16-state  distributions  discussed  in 
Section  III.  From  (6),  it  can  be  seen  that  the  free  Euclidean  distance  is  defined  as  =  y/si  for  the  64-state  trellis,  an  increase 

over  the  value  of  =  “sflA  for  the  16-state  case. 

Dimensionality  values  of  N  =  10,  12,  16,  20,  and  24  can  be  used  in  construction  of  the  transmission  symbol  table  for  a  64- 
state  trellis.  Each  of  these  five  N  values  were  used  in  the  construction  of  transmission  symbol  tables  and  their  corresponding 
distance  distributions.  As  in  the  16-state  case,  several  “good”  distributions  existed  for  each  combination  of  block  size  and 

dimensionality.  Table  III  lists  the  values  of  d and  for  the  best  distribution  for  each  combination  of  N  and  B. 

In  examining  Table  III,  the  primary  characteristic  that  is  noticed  is  that  there  is  no  distribution,  for  any  combination  of  N  and 

B,  that  reaches  the  free  Euclidean  distance  value  of  dj^^^  =  It  is  not  clear  why  this  is  the  case,  although  there  are  a  number 

of  possible  of  explanations  [4].  The  most  obvious  explanation  is  that  the  block  sizes  tested  have  not  increased  to  the  point  that 
dfree  can  be  attained.  That  is,  certain  assignments  may  reach  dj^^^  for  larger  block  sizes  than  those  tested,  say,  B=20.  However, 

this  imposes  a  huge  computational  limitation,  since  the  run-time  for  the  B=10  cases  is  on  the  order  of  days. 


B.  Bit-Error  Performance 


The  union  bound  on  the  probability  of  bit  error  for  the  64-state  trellis  can  be  computed  in  the  same  manner  as  the  16-state 
trellis  by  utilizing  (7).  For  comparison  purposes.  Fig.  2  shows  union  bound  curves  for  a  block  size  of  B  =  7.  The  plot 
compares  performance  using  lO-dimensional  and  24-dimensional  signaling  for  the  64-state  trellis.  Additionally,  the  union 
bound  curve  for  the  best  16-state  case  (from  section  III,  V=  8)  is  shown,  along  with  the  BPSK  performance  curve. 

Since  the  64-state  trellis  gives  a  wide  range  in  the  number  of  dimensions  that  can  be  utilized  in  constructing  the  transmission 
symbol  table,  the  two  extremes  of  N  =  10  and  V  =  24  were  chosen  to  be  included  in  Fig.  2.  The  plots  are  culled  from  the 
symbol  assignments  that  yield  the  best  distance  distributions  for  the  respective  dimensionalities.  Improvement  over  the  16-state 
trellis  is  easily  noticed  from  the  plot.  Additionally,  the  24-dimensional  case  offers  improvement  over  the  10-dimensional  case, 
with  a  gain  of  over  0.5  dB  present  at  a  block  size  of  B  =  7  and  Py^  value  of  10’®. 


Fig,  1.  Union  bound  on  bit-error  probability  for  16-state  trellis  (B 
=  3  through  7),  and  BPSK. 


TABLE  ffl 

Comparison  Of  Minimum  Distance  Values  And  Multipucities  For 
64-  State  Trellis. _ 


Block  Size 

iV  = 

10 

N  = 

12 

N  = 

16 

iV  = 

:20 

N  = 

24 

m 

/ 

m 

i 

m 

i 

m 

i 

m 

B 

=  4 

16 

9 

16 

4 

16 

4 

16 

4 

20 

32 

B 

=  5 

20 

11 

20 

25 

22 

35 

22 

10 

22 

10 

B 

=  6 

20 

3 

24 

21 

24 

3 

24 

3 

24 

3 

B 

=  7 

24 

7 

28 

21 

28 

14 

30 

21 

30 

14 

B 

=  8 

28 

24 

30 

16 

28 

8 

30 

16 

30 

16 

B 

=  9 

28 

18 

30 

18 

28 

9 

30 

18 

30 

18 

B-- 

=  10 

28 

10 

30 

20 

28 

10 

30 

20 

30 

20 
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Whether  or  not  this  0.5  dB  improvement  is  significant  would  be  subject  to  the  specific  design  problem.  Since  24- 
dimensional  signaling  gives  a  240%  increase  in  bandwidth  over  the  10-dimensional  case,  the  bandwidth/performance  trade-off 
would  need  to  be  weighed  heavily.  However,  since  the  improvement  does  exist,  an  increase  in  the  dimensionality  of  the 

transmitted  signal  space  is  a  viable  route  that  can  be  explored  to 
improve  the  bit-error  rate  of  the  system. 

The  64-state  trellis  demonstrates  nearly  5.5  dB  of  gain  over  BPSK 
at  B  =  7.  At  larger  block  sizes,  if  the  minimum  distance  value  moves 
closer  to  the  improvement  over  BPSK  would  increase  further. 

Additionally,  by  increasing  the  size  of  the  trellis  (to  256  states,  etc.), 
and  thus  the  value  of  further  improvement  would  be  seen. 

However,  the  drawback  of  increasing  the  block  size  and  the  size  of 
the  trellis  is  that  the  decoding  algorithm  would  also  increase  in 
complexity,  if  throughput  is  to  be  maintained  [2]. 


0  2  4  6  8  10  12 

Fig.  2.  BPSK  performance  and  bit-error  probability  union  bounds 
for  B  =  7  (,S  =  16  and  64,  N=  various). 


V.  Conclusions 

This  paper  has  examined  the  performance  of  simplex  signaling  in  a  circular  trellis-coded  modulation  scheme.  Both  Euclidean 

distance  distributions  and  bounds  on  bit-error  performance  are  examined  as  performance  benchmarks.  Performance  is  shown  to 
improve  as  various  parameters  (such  as  block  size  and  trellis  size)  are  increased. 
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Spatial  time-frequency  distributions  (STFDs)  have  been  recently  shown 
to  be  a  powerful  tool  for  solving  direction  finding  and  blind  source  separa¬ 
tion  problems  for  multi-sensor  array  receivers.  These  spatial  distributions 
are  the  natural  means  to  deal  with  source  signals  that  are  localizable  in 
the  time-frequency  domain.  This  paper  examines  the  eigenstructure  of  the 
spatial  time-frequency  distribution  matrices.  It  is  shown  that  improved  es¬ 
timates  of  the  signal  and  noise  subspaces  are  achieved  by  constructing  the 
subspaces  from  the  time-frequency  signatures  of  the  signal  arrivals  rather 
than  from  the  data  covariance  matrices.  This  improvement  is  more  evident 
in  low  signal-to-noise  ratio  (SNR.)  environment  and  in  the  cases  of  closely 
spaced  sources.  The  paper  considers  the  MUSIC  technique  to  demonstrate 
the  advantages  of  STFDs  and  uses  it  as  grounds  for  comparison  between 
time-frequency  and  conventional  subspace  estimates. 


Key  Words:  time-frequency  analysis;  subspace  analysis;  time-frequency  MUSIC;  spatial 
time-frequency  distributions;  array  signal  processing 


1.  INTRODUCTION 

Although  the  applications  of  the  spatial  time-frequency  distributions  to  blind 
source  separation  and  DOA  problems  using  multiple  antenna  arrays  in  nonsta¬ 
tionary  environments  have  been  introduced  in  [1,2],  yet  so  far  there  has  not  been 
sufficient  analysis  that  explains  their  offerings  and  justifies  their  performance.  The 
aim  of  this  paper  is  to  examine  the  eigenstructure  of  the  spatial  time-frequency 
distribution  matrices  and  provide  statistical  analysis  of  their  respective  signal  and 
noise  subspaces.  The  paper  shows  that  the  subspaces  obtained  from  the  STFDs  are 
robust  to  both  noise  and  angular  separation  of  the  waveforms  incident  on  the  array. 
This  robustness  is  primarily  due  to  spreading  the  noise  power  while  localizing  the 
source  energy  in  the  time-frequency  domain.  By  forming  the  STFD  matrices  from 
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the  points  residing  on  the  source  time-frequency  signatures,  we  in  essence,  increase 
the  input  signal  to  noise  ratio,  and  hence  improve  subspace  estimates. 

This  paper  is  organized  as  follows.  Section  2  presents  the  signal  model  and  gives 
a  brief  review  of  the  definition  and  basic  properties  of  the  spatial  time-frequency 
distributions.  In  Section  3,  we  consider  nonstationary  environment  characterized 
by  frequency-modulated  (FM)  source  signals,  and  show  the  potential  improvement 
in  direction-of-arrival  (DOA)  estimation  using  STFDs.  Section  4  examines  the 
performance  of  the  direction  finding  MUSIC  technique  based  on  the  covariance  and 
STFD  noise  subspace  estimates. 


2.  BACKGROUND 
2.1.  Signal  Model 

In  narrowband  array  processing,  when  n  signals  arrive  at  an  m-element  array, 
the  linear  data  model 


x(t)  =  y(t)  +  n{t)  =  A(©)d(t)  +  n(t)  (1) 

is  commonly  assumed,  where  the  m  x  n  spatial  matrix  A(©)  =  [a(^i)...a(0„)] 
represents  the  mixing  matrix  or  the  steering  matrix,  and  a(fl,)  are  the  steering 
vectors.  Due  to  the  mixture  of  the  signals  at  each  sensor,  the  elements  of  the  m  x  1 
data  vector  x(t)  are  multicomponent  signals,  whereas  each  somce  signals  di{t)  of 
the  n  X  1  signal  vector  d(t)  are  often  a  monocomponent  signal.  n(t)  is  an  additive 
noise  vector  whose  elements  are  modeled  as  stationary,  spatially  and  temporally 
white,  zero-mean  complex  random  processes,  independent  of  the  source  signals. 
That  is, 

E[n{t  +  T)ii^{t)]  =  a6{T)l  and  E[n{t  +  r)n^(t)]  =  0  for  any  t  (2) 

where  S{t)  is  the  Kronecker  delta  function,  I  denotes  the  identity  matrix,  a  is 
the  noise  power  at  each  sensor,  superscript  ^  and  ^  respectively  denote  conjugate 
transpose  and  transpose,  and  E{-)  is  the  statistical  expectation  operator. 

In  equation  (1),  it  is  assumed  that  the  number  of  sensors  is  larger  than  the 
number  of  sources,  i.e.,  m  >  n.  Further,  matrix  A  is  full  column  rank,  which 
implies  that  the  steering  vectors  corresponding  to  n  different  angles  of  arrival  are 
linearly  independent.  We  further  assume  that  the  correlation  matrix 

Rxx  =  Fl[x(t)x^(t)]  (3) 

is  nonsingulaj,  and  the  observation  period  consists  of  N  snapshots  with  N  >  m. 
Under  the  above  assumptions,  the  correlation  matrix  is  given  by 

=  E[x(t)x"(t)]  =  A(©)RidA^(©)  tri,  (4) 

where  Rdd  =  is  the  signal  correlation  matrix.  For  notational  con¬ 

venience,  we  drop  the  argument  ©  in  equation  (1)  and  simply  use  A  instead  of 

A(©). 


2.2.  Spatial  Time-Frequency  Distributions 

The  spatial  time-frequency  distributions  (STFDs)  based  on  Cohen’s  class  of  time- 
frequency  distribution  were  introduced  in  [1]  and  its  applications  to  direction  find- 
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ing  and  blind  source  separation  have  been  discussed  in  [2]  and  [1],  respectively.  In 
this  paper,  we  focus  one  key  member  of  Cohen’s  class,  namely  the  pseudo  Wigner- 
Ville  distribution  (PWVD)  and  its  respective  spatial  distribution.  Only  the  time- 
frequency  (t-f)  points  in  the  autoterm  regions  of  PWVD  are  considered  for  STFD 
matrix  construction.  In  these  regions,  it  is  assumed  that  the  crossterms  are  negli¬ 
gible.  The  discrete  form  of  pseudo  Wigner-Ville  distribution  of  a  signal  x{t),  using 
a  rectangular  window  of  odd  length  L,  is  given  by 


Dxx{t,f)-  x{t  +  r)x*{t-T)e  ,  (5) 

where  *  denotes  complex  conjugation.  The  spatial  pseudo  Wigner-Ville  distribution 
(SPWVD)  matrix  is  obtained  by  replacing  x{t)  by  the  data  snapshot  vector  x(f). 


Dxx{i,/) 


L-1 


x{t  +  r)x"(t  - 


(6) 


Substitute  (1)  into  (6),  we  obtain 

Dxx(t,  /)  =  Byyit,  f)  +  2Re  [Dy„(t,  /)]  4-  D„„(t,  /).  (7) 

We  note  that  Dxx(t,  /),  Dyy(t,  y),  Dyn(t,  /),  Dny(t,  /),  and  Dnn(t,  f')  are  matrices 
of  dimension  m  x  m,  whereas  the  source  TFD  matrix  Ddd(t,/)  is  of  dimension 
n  X  n.  Under  the  uncorrelated  signal  and  noise  assumption  and  the  zero-mean 
noise  property,  the  expectation  of  the  crossterm  TFD  matrices  between  the  signal 
and  noise  vectors  is  zero,  i.e.,  E  [Dy„(t,  /)]  =  E  [Dny(t,  /)]  =  0,  and  it  follows 


E  [Dxx(t,  /)]  =  Dyy(t,  f)  +  E  [D„„(t,  /)]  =  ADdd(t,  /)A^  +  E  [D„„(t,  /)] .  (8) 


For  narrowband  array  signal  processing  applications,  the  mixing  matrix  A  holds 
the  spatial  information  and  maps  the  auto-  and  cross-TFDs  of  the  source  signals 
into  auto-  and  cross-TFDs  of  the  data. 

It  is  noted  that  relationship  (8)  holds  true  for  every  {t,  f)  points.  In  order  to 
reduce  the  effect  of  noise  and  ensure  the  full  column  rank  property  of  the  STFD 
matrix,  we  consider  multiple  time-frequency  points.  This  allows  more  information 
of  the  source  signal  t-f  signatures  to  be  included  into  their  respective  subspace 
formulation.  Joint-diagonalization  [3]  and  time-frequency  averaging  are  the  two 
main  approaches  that  have  been  used  for  this  purpose  [1,  2,  4].  In  this  paper,  we 
only  consider  averaging  over  multiple  time-frequency  points. 

3.  SUBSPACE  ANALYSIS  FOR  FM  SIGNALS 

In  this  paper,  we  focus  on  frequency  modulation  (FM)  signals,  modeled  as 

d{t)  =  [d,{t),...,dnit)f  =  ,  (9) 
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where  Di  and  ^t(i)  are  the  amplitude  and  phase  of  ith  source  signal.  For  each 
sampling  time  t,  di{t)  has  an  instantaneous  frequency  fi{t)  =  ^  To  simplify 

the  analysis,  we  assume  that  the  FM  signals  are  mutually  uncorrelated  over  the 
observation  period.  That  is, 

1  ^ 

—Y2di{k)djik)  =  0  for  i  j,  i,j  =  1,  ...,n.  (10) 

^  k-l 

In  this  case,  the  signal  correlation  matrix  in  (4)  is 


Rdd  =  diag  [D^,  i  =  1, 2, ...,  n] 

where  diag[^  is  the  diagonal  matrix  formed  with  the  elements  of  its  vector  valued 
arguments.  The  ith  diagonal  element  of  TFD  matrix  Ddd(t)  /)  hi  (8)  is  given  by 


Dd,di{tJ)=  £ 


(11) 


Assuming  that  the  third-order  derivative  of  the  phase  is  negligible  over  the  window 
length  L,  then  /,  =  and  +  r)  -  ■0i(<  -  r)  -  47r/jT  =  0.  Accordingly, 


Dda.it,  f)=  £  D^i=LDl 

Similarly,  the  noise  STFD  matrix  D„n(f,  /)  is 

L-1 

T>^it,f)=  E  nit  +  T)n^it-T)e-i^^f\ 


(12) 


(13) 


Under  the  assumption  of  temporally  and  spatially  white  noise,  the  statistical  ex¬ 
pectation  of  Dnn(t,  /)  is  given  by 


E  [D„„(t,  /)]  =  ^  E  [n(t  +  T)n^{t  -  r)]  e  =  ul. 


(14) 


Therefore,  when  we  select  the  time-frequency  points  along  the  t-f  signature  or  the 
IF  of  an  FM  signal,  the  SNR  in  model  (8)  is  LDfla,  which  has  an  improved  factor 
L  over  the  one  associated  with  model  (4). 

The  pseudo  Wigner-Ville  distribution  of  each  FM  source  has  a  constant  value 
over  the  observation  period,  providing  that  we  leave  out  the  rising  and  falling 
power  distributions  at  both  ends  of  the  data  record.  For  convenience  of  analysis, 
we  select  those  N  —  L  +  1  t-f  points  of  constant  distribution  value  for  each  somce 
signal.  Therefore,  the  averaged  STFD  over  the  time-frequency  signatures  of  n© 
signals,  i.e.,  UoiN  t-f  points,  is  given  by 


n©  N—L-\-l 


D  = 


noiN-L  +  l)^^  ^ 


(15) 
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where  fq^i  is  the  instantaneous  frequency  of  the  gth  signal  at  the  ith  time  sample. 
The  expectation  of  the  averaged  STFD  matrix  is 


B  =  E 


Tlo  -r 

52  +  al]  =  -AORSdCA")"  +  al,  (16) 

U.0 


where  and  A®,  respectively,  represent  the  signal  correlation  matrix  and  the 
mixing  matrix  constructed  by  only  considering  rio  signals  out  of  the  total  number 
of  signal  arrivals  Ti¬ 
lt  is  clear  from  (16)  that,  when  no  signals  are  selected,  the  SNR  improvement 
becomes  G  =  Lfrio  (we  assume  L  >  tiq  throughout  this  paper).  Therefore,  from 
the  SNR  perspective,  it  is  better  to  select  {t,  f)  points  that  belong  to  individual 
signals,  and  to  separately  evaluate  the  respective  STFD  matrices.  Accordingly, 
STFD-based  direction  finding  is,  in  essence,  a  discriminatory  technique  in  the  sense 
that  it  does  not  require  simultaneous  localization  and  extraction  of  all  unknown 
signals  received  by  the  array.  With  STFDs,  direction  finding  can  be  performed 
using  STFDs  of  a  subclass  of  the  impinging  signals  with  specific  time-frequency 
signatures.  In  this  respect,  the  proposed  direction  finding  technique  acts  as  a 
spatial  filter,  removing  all  other  signals  from  consideration  and,  subsequently,  saves 
any  downstream  processing  that  is  required  to  separate  interference  and  signals 
of  interest.  It  is  also  important  to  note  that  with  the  ability  to  construct  the 
STFD  matrix  from  one  or  few  signal  arrivals,  the  well  known  m  >  n  condition 
on  source  localization  using  arrays  can  be  relaxed,  i.e.,  we  can  perform  direction 
finding  or  source  separation  with  the  number  of  array  sensors  smaller  than  the 
number  of  impinging  signals  [5] .  Prom  the  angular  resolution  perspective,  closed 
spaced  sources  with  different  t-f  signatures  can  be  resolved  by  constructing  two 
separate  STFDs,  each  corresponds  to  one  source,  and  then  proceed  with  subspace 
decomposition  for  each  STFD  matrix  separately,  followed  by  a  appropriate  source 
localization  method  (MUSIC,  for  example).  The  drawback  of  performing  several 
direction  finding  using  different  STFD  matrices  is  clearly  the  need  for  repeated 
computations  of  eigendecompositions  and  source  localizations. 

4.  SIMULATIONS 

The  t-f  MUSIC  is  introduced  in  [2],  where  the  jmgles  of  arrival  are  estimated  by 
locating  the  highest  peaks  of  the  spectrum  provided  by  using  the  noise  subspace  of 
the  STFD  matrix,  rather  the  covariance  matrix,  which  is  the  case  in  conventional 
MUSIC. 

The  following  example  compares  the  performance  of  conventional  and  t-f  MUSIC. 
Consider  a  uniform  linear  array  of  8  sensors  separated  by  half  a  wavelength.  Two 
chirp  signals  emitted  from  two  sources  positioned  at  angle  and  02-  The  start 
and  end  frequencies  of  the  chirp  signal  of  the  source  at  0i  are  Wsi  =  0  and  Wei  =  tt, 
while  the  corresponding  two  frequencies  for  the  signal  of  the  other  source  at  02  are 
Wg2  =  TT  and  u}e2  =  0,  respectively.  The  noise  used  in  this  simulation  is  zero-mean, 
Gaussian  distributed,  and  temporally  white.  The  noise  power,  a,  is  adjusted  to 
give  the  desired  SNR  =  —I0log{cr). 
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Fig.  1  displays  the  variance  of  the  estimated  DOA  6i  versus  SNR  fro  the  case 
(^1,^2)  =  (-10°,  10°).  The  curves  in  this  figure  show  the  theoretical  and  experi¬ 
mental  results  of  the  conventional  MUSIC  and  t-f  MUSIC  (for  L=33  and  129).  The 
CRB  is  also  shown  in  Fig.  1.  Both  impinging  signals  are  selected  when  performing 
t-f  MUSIC  (no  =  n  =  2).  We  assume  that  the  number  of  signals  is  correctly  esti¬ 
mated  for  each  case.  Simulation  results  are  averaged  over  100  independent  trials 
of  Monte  Carlo  experiments.  The  advantages  of  t-f  MUSIC  in  low  SNR  cases  are 
evident  from  this  figure. 


5.  CONCLUSIONS 

The  advantages  of  STFD-based  direction  finding  over  traditional  direction  finding 
methods  using  data  covariance  matrices  were  demonstrated  using  the  MUSIC  algo¬ 
rithm.  The  t-f  MUSIC  technique  outperforms  the  conventional  MUSIC  technique 
in  the  two  situations  of  low  SNR  and  closely  spaced  sources.  Detailed  performance 
analysis  of  DOA-based  STFD  is  given  in  reference  [6]. 
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We  consider  the  problem  of  adaptive  reception  of  a  multipath  Direct- 
Sequence  Spread-Spectrum  (DS-SS)  signal  in  the  presence  of  unknown  cor¬ 
related  SS  interference  and  additive  impulsive  noise.  The  proposed  SS 
receiver  structure  is  comprised  by  a  vector  of  adaptive  chip-based  Hampel 
non-linearities  followed  by  an  adaptive  Auxiliary- Vector  linear  tap-weight 
filter.  The  non-linear  receiver  front-end  adapts  itself  to  the  unknown  pre¬ 
vailing  noise  environment  providing  robust  performance  over  a  wide  range 
of  underlying  noise  distributions.  The  adaptive  Auxiliary- Vector  linear  tap- 
weight  filter  allows  rapid  SS  interference  suppression  with  a  limited  data 
record.  Numerical  and  simulation  studies  offer  comparisons  with  the  con¬ 
ventional  Minimum-Variance- Distortionless-Response  (MVDR)  SS  receiver 
[6]-[9]  as  well  as  MVDR  filtering  preceded  by  vector  adaptive  chip-based 
non-linear  processing  [10]. 


Key  Words:  Spread  spectrum  communication;  impulse  noise;  adaptive  filters 
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0.  INTRODUCTION 

Signal  detection  in  the  presence  of  impulsive  channel  noise  has  been  considered 
extensively  in  the  past  (for  example  in  [l]-[3]  and  references  therein),  while  de¬ 
tection  of  a  direct-sequence  spread-spectrum  (DS-SS)  signal  under  similar  channel 
conditions  has  been  studied  in  [4],  [5],  and  [10].  Receiver  proposals  in  [4],  [5]  involve 
the  use  of  either  a  conventional  signature  matched  filter  or  a  majority-vote  receiver 
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(hard-limiter  non-linearity  per  chip  followed  by  signature  matched-filtering).  In 
[4]  it  is  reported  that  neither  one  of  the  above  proposals  is  universally  effective 
against  the  combination  of  SS  interference  and  non-Gaussian  impulsive  noise.  In 
[10],  adaptive  receivers  are  developed  that  are  comprised  by  a  vector  of  adaptive 
chip-based  non-linearities  followed  by  an  adaptive  linear  tap- weight  filter.  The 
structures  proposed  in  [10]  tap  the  relative  merits  of  both  non-linear  and  linear 
signal  processing  and  exhibit  superior  BER  performance  in  the  presence  of  com¬ 
bined  impulsive  and  SS  interference.  In  particular,  the  non-linear  receiver  front- 
end  adapts  itself  to  the  unknown  prevailing  impulsive  noise  environment,  while  the 
adaptive  linear  tap-weight  filter  that  follows  the  non-linearly  processed  chip  samples 
combats  effectively  the  SS  interference.  This  article  enhances  our  previous  work  in 
[10]  in  the  following  aspects.  The  receiver  design  objective  is  shifted  to  superior 
bit  error  rate  (BER)  performance  under  rapid  short-data-record  adaptation.  In 
addition,  the  signal  model  is  generalized  to  account  for  multipath  signal  reception 
and  a  new  Hampel-type  non-linear  pre-processor  is  considered  that  encompasses 
the  pre-processors  considered  in  [10]  as  special  cases. 

1.  SIGNAL  MODEL 

The  baseband  received  signal  is  viewed  as  the  aggregate  of  the  multipath  received 
SS  signal  of  interest  with  signature  code  So  of  length  L  (if  T  is  the  symbol  period  and 
Te  is  the  chip  period  then  L  =  TjTc),  K  -  I  multipath  received  DS-SS  interferers 
with  unknown  signatures  S*,  k  =  1, . .  .,K  —  1,  and  non-Gaussian  (impulsive) 
interference.  For  notational  simplicity  and  without  loss  of  generality,  we  choose  a 
synchronous  signal  set-up.  We  assume  that  the  multipath  spread  is  of  the  order  of  a 
few  chip  intervals,  M,  and  since  the  signal  is  bandlimited  to  B  =  1/Tc  the  channel 
is  modeled  as  a  tap-delay  line  with  M  +  \  taps  spaced  at  chip  intervals  %.  After 
conventional  chip-matched  filtering  and  sampling  at  the  chip  rate  over  a  multipath 
extended  symbol  interval  of  L  -f-  M  chips,  the  L  d-  M  data  samples  are  organized 
in  the  form  of  a  vector  r  given  by 

K-l  M 

r  =  ^  ^  Ck,m\/^{hSk,m  +  K^k,m  +  “  (1) 

fc=0  m=0 

where,  with  respect  to  the  fc-th  SS  signal,  Ek  is  the  transmitted  energy,  bk,  6^,  and 
are  the  present,  the  previous,  and  the  following  transmitted  bit,  respectively,  and 
{cit,m}  are  the  coefficients  of  the  frequency-selective  slowly  fading  channel  modeled 
as  independent  zero-mean  complex  Gaussian  random  variables  that  are  assumed 
to  remain  constant  over  several  symbol  intervals.  Sk,m  represents  the  0-padded 
by  M,  m-cyclic-shifted  version  of  the  signature  of  the  ^r-th  SS  signal  S*,,  ^  is 

the  0-filled  (L  —  m)-left-shifted  version  of  Sfc,o,  and  is  the  0-filled  (L  —  m)- 
right-shifted  version  of  Sj;,o.  Finally,  n  represents  additive  complex  non-Gaussian 
impulsive  noise. 

For  conceptual  and  notational  simplicity  we  may  rewrite  (1)  as  follows; 

r  =  \/%bow^^_^^p  + 1  -I-  n  (2) 

where  =  IZm=o  co,mSo,m  is  the  effective  (channel  processed)  signature  of 

the  SS  signal  of  interest  (signal- 0)  and  I  identifies  comprehensively  both  the  Inter- 
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Symbol  and  the  SS  interference  present  in  (1).  We  use  the  subscript  R-MF  in  our 
effective  signature  notation  to  make  a  direct  association  with  the  RAKE  Matched- 
Filter  receiver  that  is  known  to  correlate  the  signature  So  with  M  -|- 1  size-L  shifted 
windows  of  the  received  signal  (that  correspond  to  the  M  + 1  paths  of  the  channel), 
appropriately  weighted  by  the  conjugated  channel  coefficients  Co^m,  m  =  0,. .  .,M. 


In  our  notation,  this  RAKE  operation  corresponds  to  linear  filtering  of  the  form 
Wr.mf  r,  where  H  denotes  the  Hermitian  operation. 


2.  RECEIVER  ARCHITECTURE  AND  ALGORITHMIC 
DEVELOPMENTS 

For  the  multipath  signal  model  of  the  previous  section,  the  general  receiver  struc¬ 
ture  under  consideration  is  given  in  Fig.  1.  The  receiver  consists  of  a  non-linear 
front-end  in  the  form  of  a  vector  of  parametrized  non-linearities  g(r ;  •)  :  — ». 

(’L+M  ^  followed  by  linear  filter  post-processing  by  an  Z,  -|-  M  complex  tap-weight 
filter  w.  The  non-linear  pre-processor  considered  in  this  present  work  employs 

Hampel-type  non-linearities  g(r;  01,0(2,  as)  =  [ffCri;  ai,  02, 03)  ai,  02, 

03)  ]  where  T  denotes  the  transpose  operation  and 

X,  if|a;|<ai,  0  <  ai 

aijfi,  ifai<|x|<a2,  0  <  ai  <  02 

aglojodlfl,  if‘a2  <  |a:|  <  03,  0  <  oi  <  02  <  03 

0,  otherwise  . 


In  (3)  X  is  a  complex  number  and  |x|  denotes  the  magnitude  of  x.  The  linear 
region  of  the  Hampel  non-linearity  has  the  effect  of  passing  the  observations  undis¬ 
torted.  The  non-linear  regions  either  completely  reject  (remove)  or  “correct”  the 
observations.  The  latter  is  considered  as  an  adjustment  of  the  magnitude  while 
maintaining  the  phase.  The  parameters  ai,a2,  and  03  are  positive  cut-off  param¬ 
eters  to  be  determined  adaptively.  The  Hampel  pre-processor  is  a  generalization  of 
the  puncher  and  clipper  pre-processors  considered  in  [10]. 

In  [10]  the  linear  filter  post-processor  was  chosen  to  be  the  Minimum- Variance- 
Distortionless-Response  (MVDR)  solution  for  the  non-linearly  processed  data  vec¬ 
tors.  Adaptive  SS  interference  suppression  with  MVDR  post-processing  has  two 
shortcomings  that  we  attempt  to  improve  upon  in  this  article.  First,  the  adaptive 
optimization  computational  complexity  may  be  prohibitive  for  mobile  SS  receivers 
due  to  the  {L  +  M)  x  (A  -f  M)  autocorrelation  matrix  inversion  operation.  This  is 
particularly  true  for  systems  with  large  spreading  gain  L.  Second  and  most  impor¬ 
tant,  the  data  estimated  adaptive  implementation  of  the  MVDR  filter  w[J],dr(N)  = 
[w[^->.?pRg'(N)w<f.U-'  Rg'(N)w<[>.Lp,  where  Rg(N)  =  ELi  g(rn)g(r„)^  is 
the  sample  average  estimate  of  the  autocorrelation  matrix  over  a  data  record  of 
N  Hampel  processed  input  vectors,  exhibits  disappointing  short  data  record  per¬ 
formance.  Data  records  of  size  many  times  the  input  vector  dimension  L  +  M  are 
necessary  to  approach  satisfactorily  the  HER  performance  of  the  ideal  wU(,dr  filter, 
i.e.  the  filter  that  assumes  perfectly  known  Rg^-  In  the  following  we  address  those 
two  issues  of  reduced  optimization  complexity  and  superior  small-sample-support 
performance  in  the  context  of  what  we  call  auxiliary-vector  processing. 
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We  consider  the  class  of  linear  filter  post-processors  w  in  Fig.  1  that  are  “dis¬ 
tortionless”  in  the  w£mf  vector  direction  of  interest,  i.e.  w^Wr.mf  =  Hwr.mpII  so 
that  no  cancelation  of  the  signal  of  interest  takes  place.  This  filter  class  is  the  set 
of  all  filters  that  can  be  written  in  the  form 


A 

where,  for  notational  simplicity,  n  =  denotes  the  normalized  RAKE 

matched  filter  for  the  SS  signal  of  interest  0,  is  a  complex  scalar,  and  Qs  is  a 
vector  in  the  L  +  M  complex  space  that  is  orthonormal  with  respect  to 


The  superscript  g  that  appears  in  (4)  is  intended  to  serve  as  a  reminder  that  any 
specific  choice  for  the  scalar  ^  and  the  vector  Q  needs  to  account  for  the  non-linear 
Hampel  pre-processor  g(  )  in  Fig.  1.  The  receiver  architecture  that  incorporates 
post-filtering  by  wi^  in  (4)  is  shown  in  Fig.  2. 

In  contrast  to  minimum  output  variance  optimization  that  leads  to  the  optimum 
wSIdr  filter  [10],  we  propose  to  choose  an  “auxiliary  vector”  Q®  that  satisfies 
the  orthonormality  constraint  in  (5)  and  maximizes  the  magnitude  of  the  cross¬ 
correlation  between  points  (a)  and  (b)  of  the  receiver  structure  in  Fig.  2.  Standard, 
Lagrange  multipliers  derivation  shows  that  this  vector  is 


QS  = 


-  (w[irMFiiRsWir>.MFii)Wiri.MFii 
IRs^IiImfII  -  (^|t.lMF||I^g^||R-MFl|)^rH-MP|il 


(6) 


Then,  the  complex  scalar  weight  /j.^  in  the  receiver  structure  of  Fig.  2  is  chosen  to 
be  the  value  that  minimizes  the  Mean-Square  (MS)  error  between  points  (a)  and 
(c).  Direct  application  of  the  Yule- Walker  theorem  shows  that  this  MS-optimum 
value  of  /i®  is 


Q""«-gwg.MF|| 

Qg^RgQs 


(7) 


This  filter  design  approach  can  be  generalized  to  cover  processing  with  multi¬ 
ple  auxiliary  vectors  of  the  form  wiv^  =  ^[|r1mp||  ~  l^«=i  I'f  Qf  >  where  each  Qf , 
i  =  is  orthonormal  with  respect  to  The  weighted  auxiliary 

vectors  are  conditionally  optimized  in  a  sequential  fashion  as  follows:  Qf  and 
/if  are  chosen  as  before,  in  (6)  and  (7),  respectively.  Given  Qf  and  /if,  Qf  is 
set  to  be  the  orthonormal  to  vector  that  maximizes  the  magnitude  of 

the  cross-correlation  between  — /if Qf)  g(r)  and  Qf^g(r)-  Given  Qf, 

/if,  and  Qf,  the  weight  value  /if  is  chosen  to  minimize  the  MS  error  between 
(w[j^^p|l  -  /if  Qf)  g(r)  and  /if*Qf^g(r).  Inductively,  we  can  obtain  similar  ex¬ 
pressions  for  Q|+i  and  /if+i  given  Qf ,  /if,  •  ■  •,  Qf ,  fif,l<P<P-  The  auxiliary 
vector  generation  procedure  may  stop  when  the  cross-correlation  magnitude 

I  ('’’''’i'ir  mf||  “  IIi=i  Aif  Qf)  ^sQp+i  below  a  prespecified  threshold  value. 
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The  auxiliary  vector  filter  wiv^  defined  above  has  two  major  advantages  in  com¬ 
parison  with  the  wl?vDR  that  was  used  in  [10].  The  first  advantage  has  to  do 
with  the  pertinent  optimization  computational  complexity.  While  both  filters  are 
a  function  of  the  RAKE  matched  filter  Wr?mf  and  the  Hampel  pre-processed  in¬ 
put  data  autocorrelation  matrix  Rg,  no  matrix  inversion  operation  is  required 
for  the  auxiliary-vector  filter.  The  second  and  most  important  advantage  has 
to  do  with  the  short  data  record  behavior  of  the  filter  estimators  wiv^N)  and 
Wmvdr(N)  that  are  based  on  an  Af— point  estimate  of  the  autocorrelation  matrix 
Rg(N)  =  ^  Yln=i  The  variance  of  wlv^N)  is  significantly  lower  than 

the  variance  of  w[?^dr(N)  and  this  translates  to  superior  short  data  record  perfor¬ 
mance  as  seen  in  the  next  section.  Of  course,  as  JV  — +  oo,  wiv^N)  ^.nd 

Wmvdr(N)  with  probability  one,  and  in  general  wiv^  ^  w^J’vdr-  So,  this 

is  a  case  of  trading  bias  for  lower  variance. 

To  complete  the  algorithmic  developments  for  a  fully  adaptive  implementation  of 
the  DS-SS  receiver  in  Fig.  2,  we  turn  our  attention  to  the  Hampel  cut-off  parameters 
tti,  02,  03.  Adaptive  cut-off  parameter  optimization  can  be  pursued  exactly  as  in 
[10],  in  the  form  of  a  decision  driven.  Minimum  Bit-Error- Rate  (MBER)  stochastic 
approximation  recursion. 

3.  NUMERICAL  AND  SIMULATION  STUDIES 

We  examine  DS-SS  signal  transmissions  with  spending  gain  T  =  63  in  the  pres¬ 
ence  of  5  SS  interfering  signals  and  impulsive  noise.  The  normalized  synchronous 
signature  cross-correlation  of  the  interfering  signals  with  the  signal  of  interest  is  ap¬ 
proximately  25%  while  the  signature  codes  of  the  interferers  are  nearly  orthogonal  to 
each  other.  The  communication  channel  is  modeled  as  a  multipath  Rayleigh  fading 
channel  with  4  paths  and  zero  mean  complex  Gaussian  fading  coefficients  of  variance 
0.5  (i.e.  E{\ck,m\^}  =  0.5)  for  all  paths  m  =  0,  •  •  •, 3  and  all  SS  signals  fc  =  0,  •  •  • , 5. 
The  average  total  received  interfering  signal  energies  E*,  set 

equal  to  9,10,11,12,  and  \ZdB  for  k  =  1,2,  ■■•,5,  respectively.  The  impulsive 
channel  noise  is  modeled  according  to  the  familiar  e-mixture  disturbance  model 
fi{x)  =  (1  —  e)/o(x)  -f  e/i(x)  where  e  €  [0, 1]  accounts  for  the  probability  under 
which  the  noise  is  /i(-)  distributed.  The  nominal  pdf  /o(-)  is  taken  to  be  0-mean 
complex  Gaussian  with  variance  tro  =  1.  The  “contaminating”  pdf  /i(-)  is  assumed 
to  be  0-mean  complex  Gaussian  with  variance  erf  =  7^<ro  (j^  —  000)  ^^d  e  is  set 

equal  to  0.2. 

In  Fig.  3  we  compare  the  BER  behavior  of  the  conventional  MVDR  filter,  the 
Hampel-MVDR  filter,  and  the  Hampel- AV  (Auxiliary- Vector)  filtering  procedures 
developed  herein.  All  cut-off  parameter  and  filter  estimates  are  based  on  a  data 
record  of  128  samples.  The  multipath  fading  channel  is  assumed  to  remain  constant 
during  adaptation  and  the  BER  induced  by  each  receiver  for  each  channel  is  aver¬ 
aged  over  100  randomly  drawn  channels  and  10  receiver  realizations  per  channel. 
As  seen  in  Fig.  3,  the  superiority  of  the  Hampel-AV  adaptive  receivers  is  apparent. 

To  study  the  effect  of  the  sample  support  size  on  the  BER  performance  of  the 
adaptive  receivers  under  examination,  we  fix  the  average  total  received  energy  of 
the  SS  signal  of  interest  at  Eq  Ylm=o  =  12dB  and  we  repeat  the  studies 

of  Fig.  3  as  a  function  of  the  data  record  size.  Fig.  4  demonstrates  the  superiority 
of  AV  post-filtering  over  the  whole  data  support  range  of  practical  interest. 
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FIG.  1.  General  receiver  structure. 


FIG.  2.  Auxilietry-vector  receiver  structure. 


FIG.  3.  Bit-Error-Rate  versus  total  received 
energy  for  the  SS  signal  of  interest. 


FIG.  4.  Bit-Error-Rate  versus  sample  support. 
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A  Model  for  Shape  and  Dexture 
Content-Based  Image  Compression 
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Abstract—  Currently  a  large  area  of  research  is  being  devoted  to  compressed  domain  content-based  compression  due  to  the 
JPEG-2000  and  MPEG-7  requirements.  Current  compression  quantization  techniques  are  not  geared  to  extract  models  of  image 
structures  as  a  part  of  the  compression  process  due  to  the  fact  that  the  they  are  geared  to  attain  maximum  compression  ratio  for  the 
lowest  cost.  Additionally  the  entropy  encoding  processes  leave  the  compressed  data  unrecognizable  due  to  its  need  to  completely 
randomize  the  data  for  maximum  compression.  Recently  there  has  been  much  work  in  wavelet  and  fractal  methods  for  texture  and 
shape  segmentation  as  well  as  data  compression.  These  methods  contain  implicit  models  for  shape  and  texture  coding  as  a  natural 
part  of  the  compression  process.  We  thus  develop  a  method  for  wavelet  fractal  compression  which  extracts  and  codes  shape  and 
texture  primitives.  This  method  makes  use  of  the  Mallat  Gaussian  derivative  basis  set  and  an  implicit  Markov  shape  and  texture 
model. 

I.  Introduction 


Recently  there  has  been  much  interest  in  defining  the  relationship  between  wavelets  and  fractals  [5].  We  will  first  dis¬ 
cuss  a  particular  type  of  wavelet  developed  by  Stephane  MaUat  which  uses  the  Gaussian  Derivative  for  a  basis  set.  Secondly  we 
will  discuss  how  the  properties  of  this  wavelet  can  be  used  to  enhance  the  block  quantization  process  in  wavelet  fractal  encoding. 
We  will  then  show  the  paraUels  between  he  wavelet  fractal  block  quantization  process  and  a  well  known  Wavelet  Markov  model 
and  use  the  properties  of  the  model  to  enhance  the  compression  process.  We  will  then  describe  how  this  Markov  model  can  be  used 
to  integrate  texture  and  shape  into  the  compression  process.  Finally  we  will  describe  the  image  reproduction  process  and  show  an 
application  of  our  compression  method. 


n.  Gaussian  Derivative  Wavelet 


The  high  g(x)  and  low  h(x)  pass  elements  of  the  Gaussian  derivative  wavelet  transform  are  often  cascaded  in  a  filter 
bank  structure  as  is  shown  in  Figure  1.  This  methodology  is  a  computationally  efficient  means  of  dividing  a  signal  into  organized 
set  of  frequency  bands.  One  type  of  filter  bank  structure  is  the  Mallat  [14]  multiresolution  frequency  bank  or  MRA  for  short.  A 
particular  implementation  of  this  transform  know  as  the  dyadic  MRA  structure.  This  particular  implementation  of  the  MRA  uses 
the  Gaussian  derivative[14]  basis  set  as  shown  in .  As  we  shall  show  later  the  Gaussian  derivative  has  many  desirable  properties 
useful  in  signal  and  image  analysis  including  an  ability  to  accurately  reveal  boundaries  and  edges  in  signals  and  imagery  minimal 
artifact  production  in  image  reproduction.The  result  is  a  an  extremely  effective  means  of  detecting  singularities  (edges)  in  a  signal. 

If  we  represent  our  gaussian  g^(,x)  Mallat  [14]  uses  an  elegant  technique  for  two  dimensional  decomposition  which  lends  itself 

to  image  compression  and  is  the  basis  of  our  wavelet  fractal  method.  The  algorithm  consists  of  first  preprocessing  an  image  with  a 
multiscale  wavelet  decomposition  as  described  by  Mallat.  In  two  dimensions  the  lowpass  is  performed  as  a  separable  transform  as 
shown  in  Figure  3  and  the  highpass  is  done  as  the  1-D  transform  in  X  and  Y  dimensions  independently. 

W  =  (w]/(x,y),  wlfix,y)). 

The  X  and  Y  highpass  functions  are  then  combined  to  form  modulus  edge  image  and  gradient  edge  image  as  is  described  in  equa¬ 
tions  2  and  3. 


M/(x,y)  =  Jwl/(x.y)|^  +  |w2/(^,y)|^ 

(2) 

A/(x,y)  =  aig{W^f(x,y)  +  iW^f(x,y)) 

(3) 
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The  difference  between  Mallat  in  his  two  dimensional  biorthogonal  transform  and  the  ordinary  orthogonal  Harr  two 
dimensional  transform  is  that  he  does  not  subsample  his  images  as  in  the  1  dimensional  case  and  that  he  applies  only  one  filter  in 
the  X  and  Y  directions  to  compute  a  polar  representation  for  modulus  maxima  for  each  of  the  high  pass  bands.  Thus,  Mallat  has 
only  two  high  pass  bands  which  can  be  represented  in  terms  of  x  and  y  or  polar  representation. 

IV.  Fractal  Encoding 

We  now  describe  the  process  of  fractal  quantization.  Classic  fractal  compression  is  described  in  terms  of  Harr  basis 
sets  and  cross  scale  approximation  which  is  in  effect  performed  due  to  the  averaging  process  in  fractal  compression.  There  are 
essentially  three  parameters  needed  for  fractal  reconstruction  as  was  discussed  in  Jaquin  and  Fischers’s  techniques  and  indicated  in 
equation  1 


f(x)  =  Tf(x)  =  Ujf(x)  +  b  =  QjJ{2x)+b 


(4) 


where  f(x)  is  the  image  to  be  transformed  and  T  is  a  contractive  operator  with  unique  fixed  point  f.  Encoding  f  means  finding  and 
iterator  T  having  a  fixed  point  T approximately  equal  to  f  while  decoding  is  equivalent  to  finding  the  fixed  point  f  by  iterating  T 
starting  with  an  image  selected  at  random.  ULteptesents  the  transformation  applied  to  the  domain  blocks  which  both  grey  levels 
and  decimates;  Ql  represents  a  simpler  Ul  that  simply  scales  the  already  decimated  domain  blocks,  and  b  is  the  intensity  offset 
applied  to  the  domain  blocks.  Note  that  f,  b  e  V  where  V  is  a  discrete  and  finite  dimensional  space.  Jaquin[10]  describes  this  block 
matching  between  scales  in  terms  of  a  Markov  operator  since  in  a  pure  fractal  (Global  IFS)  [3]  intensity  and  geometric  matching 
occurs  at  all  wavelet  scales  whereas  in  Jaquin’s  fractal  approximation  it  occurs  between  two  wavelet  scales. 

Instead  of  using  the  Harr  basis  set  we  insert  the  Mallat  Gaussian  derivative  wavelet  transform  into  the  fractal 
compression.  It  is  also  interesting  to  note  as  shown  in  that  the  biorthogonal  spline  has  a  much  more  localized  frequency  response 
than  the  Harr  basis  set  [5].  This  fact  is  extremely  important  in  the  fractal  reconstruction  process  since  the  Harr  basis  contains  many 
high  frequency  artifacts  in  reproduced  images  due  to  its  sharp  spatial  domain  cutoff.  The  Gaussian  derivative  spline  has  a  much 
smoother  spatial  cutoff  and  thus  much  less  tendency  to  create  artifacts  in  imagery,  as  is  shown  in  the  frequency  plot  below  looking 
at  all  models  of  fractal  encoding  we  realize  that  the  trend  is  to  reconstruct  an  object  from  its  low  frequency  components  to  its  high 
frequency  components.  In  the  wavelet  fractal  case  this  method  approximates  wavelet  coefficients  across  scales.  This  model  fits 
with  the  fractal  method  since  low  frequency  corresponds  to  large  scale.  We  also  know  that  the  Mallat  multiresolution  decomposi¬ 
tion  happens  in  dyadic  scales  which,  in  the  Harr  basis  set  case,  correqionds  to  blocks  which  are  dimensions  are  powers  of  2  in  size. 
Thus  we  shall  keep  with  this  framework  for  the  Gaussian  derivative  basis  set.  Putting  this  in  the  context  of  the  wavelet  transform 
we  re-write  equation  4  by  inserting  equation  1  as; 

=  QiW^(x,y)  +  b  (5) 


Thus  we  build  our  reconstructed  image  from  the  low  frequency  or  large  scale  images  first  and  then  eventually  reconstruct  the  final 
image.  Note  that  his  process  use  the  wavelet  decomposition  to  explicitly  separate  scales  by  frequency.  Thus  for  a  given  block  size 
we  only  have  information  that  is  fits  that  particular  scale. 

V.  Localized  Texture  Coding 

In  order  to  determine  the  intensity  offset  or  b  parameter  from  equation  5  between  two  scales  we  find  the  blocks  that 
best  match  eachother  between  two  scales  In  traditional  fractal  encoding  we  do  this  do  an  exhaustive  IMS  search  of  range  to 
domain  blocks  and  take  the  block  with  the  lowest  difference.  If  we  look  at  wavelet  theory  however  we  find  that  such  exhaustive 
block  matching  is  unnecessary.  The  reason  stems  from  observing  the  properties  of  a  Markov  wavelet  model  described  with  by 
Luettgen  and  Vfillsky[12] 

To  represent  this  Markov  random  field  we  define  a  given  node  in  the  quad  tree  structure  as  s,  its  children  nodes  as 
^^NW'^^NE'^^SE’^^SW  its  parent  node  as  sy  where  y  shifts  the  wavelet  coefficients  from  parent  sy  to  child  s.  Now, 

defining  a  MRF  on  a  2^  x  2^  lattice,  a  state  at  the  mth  level  represents  the  values  of  the  MRF  at  16(2^  "  -  1 )  points.  This  set 
of  points  is  denoted  as  and  it  is  the  union  of  4  mutually  exclusive  subsets.  In  general  we  can  divide  into  four  set  sets  of 

4(2W-m(i)_i)  pojuts  ijj  a  sijjjjjar  fasjjion,  and  we  denote  these  subsets  as  F^,-,  ie  {NW,NE,SE,SW}  .  Now  if  we  have  the 
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random  variable  Z  representing  the  current  state  of  any  at  any  stage  of  the  tree  then  we  insert  our  local  IFS  relationship  as 


Z,=  Q^Zj.+  b  (6) 

defined  in  the  fractal  wavelet  context  as 

^ Z,eW^(x,y)  (7) 

thus  the  basic  probabalistic  Markov  relationship  is  defined  as 


(Z„te  r,^\Z^Te  F,  ,.) 


(8) 


This  relationship  defines  regions  of  constant  texture.  For  any  region  of  constant  texture  the  slope  of  the  energy 
decay  across  scales  remains  fixed.  This  energy  decay  also  known  as  the  Lipschitz  or  Holder  exponent,  also  closely  related  to  frac¬ 
tal  dimension  [15],  can  be  characterized  in  detail  by  the  Markov  model.  If  we  select  some  a  where  0  <  a  <  1  and  the  function  f(x,y) 
is  uniformly  Lipschitz  over  an  open  set  of  reals  if  there  exists  a  constant  K  such  that  for  all  points  (x,y)  of  this  open  set 


M ^x,y)^K(2') 


(9) 


a  represents  slope  of  the  decay  function  at  any  given  spatial  point  in  an  image.  The  b  parameter  a  linear  approximation  of  a 
between  any  scales  if  it  is  computed  using  the  localized  markov  approach  since  a  seeks  to  characterized  the  decay  across  scales 
within  cone  of  influence  of  the  transform.  This  cone  of  influence  resides  within  the  spatial  boundaries  of  the  quadtree  in  our  case 
because  of  the  dyadic  decomposition  of  the  wavelet  transform  in  this  area 

VI.  Localized  Shape  Coding 

To  find  a  mapping  Ql  from  shape  to  texture  for  a  particular  object  we  need  an  efficient  mechanism  for  determining  a 
geometric  mapping.  If  we  recall,  range  and  domain  blocks  in  traditional  fractal  encoding  are  obtained  by  subsampling  the  image 
and  then  matching  each  range  block  to  every  possible  domain  block  in  an  image[10].  Needless  to  say  this  encoding  process  takes 
an  extremely  long  time  and  is  one  of  the  major  drawbacks  of  traditional  fractal  encoding. 

To  simplify  the  process  of  range  to  domain  block  matching  thus  finding  the  mapping  for  Ql  in  equation  5  we  classify 
the  range  and  domain  blocks  by  summing  the  blocks’  gradient  angle  parameters  since  these  gradient  are  accurate  indications  of 
energy  and  direction  within  each  block.  Jaquin  used  a  similar  procedure  in  his  classification  of  blocks  in  traditional  fractal  encod¬ 
ing.  by  applying  the  centered  operator  to  each  block.  Now  with  Mallat’s  wavelet  decomposition  energy  direction  is  already  indi¬ 
cated  as  a  natural  part  of  the  process.Thus  the  operation  of  block  classification  thus  becomes  a  lookup  table  procedure  rather  than 
an  exhaustive  matching  process.  Both  range  and  domain  block  position  in  the  image  are  stored.  The  block  rotation  value  is  also 
determined  by  applying  the  appropriate  flip  that  makes  the  block  gradient  angles  match  most  closely 

In  addition  to  block  matching  the  shape  contour  information  about  objects  in  the  scene  can  be  also  included  in  the 
compressed  object  by  encoding  around  modulus  maxima  values.  This  also  is  a  crude  form  of  quantization  around  blocks  with 
energy  above  a  given  threshold  since  energy  with  the  Gaussian  derivative  is  centered  around  the  edges  or  zero  crossings  of  the 
encoded  of  the  encoded  image.  This  process  can  be  used  to  associate  groups  of  blocks  in  conjunction  with  the  localized  texture 
metrics.  Other  weU  known  shape  grouping  algorithms  may  also  be  employed  such  as  chain  coding. 


Vn.  Object  Quantization 


Rate  distortion  in  a  wavelet  fractal  sense  is  handled  by  how  many  wavelet  scales  are  used  in  the  decomposition  of  the 
image.  [7]  The  number  of  scales  starts  with  the  largest  scale,  and  thus  largest  range  block  used.  We  go  to  the  next  lower  scale  and 
thus  next  smaller  range  block  in  a  quadtree  form  if  we  do  not  find  an  adequate  match  between  range  to  domain  block.  This  process 
is  now  described  in  terms  of  classic  rate  distortion. 
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Traditional  quantization  deals  with  Q  [17]  as  the  set  of  admissible  scalar  frequency  quantization  choices  and  T  as  the 
complete  spatial  tree  and  S  as  the  pruned  spatial  tree  where  5  <  T .  In  our  model  based  approach  instead  of  S  our  tree  structure  is 
pruned  according  to  some  model  preconfigured  model.  We  denote  this  tree  model  which  consists  of  the  set  of  nodes  which 
characterize  our  texture  and  shape  region  of  interest. 

Our  model  optimization  problem  seeks  to  solve  the  traditional  rate  distortion  problem  where  a  frequency  quantizer 

qsQ 


min  D(q,Si^)  subject  to  (10) 

where  DCq.S^)  and  R(q,SM)  represent  the  distortion  and  rate  respectively  associated  with  the  flrequency  and  spatial  quantizer 

choices  qe  Q  and  . 

As  in  conventional  cost/optimization,  this  constrained  optimization  problem  can  be  solved  by  unconstrammg  it  via  the 
Lagrange  multiplier  X.  >  0,  which  quantifies  the  trade-off  between  rate  and  distortion  and  minimizing  the  unconstrained  Lagrangian 
cost 


J(q,  5^)  =  D(q,  S,^)  +  XR(q,  Sj^^)  (1 D 

We  now  use  the  quantization  process  Q  to  define  a  simple  model  by  the  average  holder  exponent  for  the  range  of 
scales  over  which  modulus  maxima  for  a  given  object  are  computed.  Since  b  is  a  linear  approximation  of  the  Holder  exponent 
between  two  scales  and  we  wish  to  define  the  wavelet  transform  across  a  range  of  scales  we  can  approximate  this  ag 

Thus  for  a  region  of  constant  texture  a  measurement  of  a  can  be  determined  using  the  average  value  of  b  for  a  given 
range  of  encoding  scales  s  denoted  S  {s:0  <  s  <N-1.}  and  this  is  computed  as: 


a 


N-i 


s  =  0 


S  N-l 


(12) 


Thus  we  segregate  and  prune  regions  of  our  image  by  restricting  to  be  less  than  a  certain  maximum 

o„  <max{o„  )  (13) 

Because  this  quantization  process  naturally  removes  regions  of  low  intensity  around  the  modulus  maxima  and  the 
modulus  maxima  define  shape  boundaries  of  objects  our  texture  quantization  methodology  naturally  finds  the  boundaries  of 
objects  and  encodes  them.  Shapiro  noticed  that  the  natural  decay  of  frequency  between  scales  can  be  exploited  to  optimize  the 
compression  process  by  eliminating  wavelet  coefficients  that  fall  below  some  threshold.  For  encoded  texmre  and  shape  objects  our 
procedure  does  not  make  this  assumption  but  encodes  the  spectrum  assuming  our  Markov  model  where  spectral  intensity  can 
increase  or  decrease  across  scales.  Our  quantization  method  is  applied  within  individual  objects  to  optimize  their  overall  quality. 
Variation  of  texture  and  shape  definitions  will  result  in  different  compression  ratios  depending  on  the  size  of  the  uniformity  and 
size  of  the  texmre  and  shape  region. 

Vin.  Decoding 

Because  our  frequency  mapping  proceeds  from  low  to  high  frequency  our  reconstruction  process  simply  proceeds  by 
iterating  our  compressed  parameters  on  the  low  pass  image.  For  each  scale  the  iterative  procedure  forms  the  approximation  of  the 
next  higher  lowpass  image  and  is  then  lowpass  filtered  and  the  process  is  repeated.  This  process  thus  removes  all  blocky  artifacts 
in  the  image  while  still  revealing  the  image  feamres.  The  above  technique  leads  to  a  direct  reconstruction  method  as  to  in  the  fol¬ 
lowing  equation 

j  =  J 

while  (j  >  0) 

j=j-l 

endwhile 
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Thus  in  the  compressed  file  our  first  step  is  to  upsample  compressed  lowpass  image  and  then  use  the  stored  fractal 
local  iterated  function  parameters  on  it  to  reconstruct  the  original  image  building  each  new  scale  from  the  previous  lowpass  image. 
The  image  can  be  restored  to  any  desired  resolution  simply  by  stopping  the  reconstruction  process  at  a  given  scale.  This  iterative 
procedure  is  only  order  0(N)  where  N  is  the  number  of  image  blocks  for  all  scales  and  thus  can  be  performed  real  time  with  no 
special  hardware.  Thus  this  approach  is  a  significant  advance  over  the  Mallat  alternating  projections  reconstruction  method 
because  it  is  extremely  efficient  in  its  reconstruction  speed. 


IX.  A  Compression  Application 


We  now  define  a  compression  plication  which  will  allow  us  to  exploit  the  advantages  of  our  methodology  while  still 
retaining  good  overall  compression  performance.  It  is  well  known  that  tranditional  wavelet  entropy  encoding  methods  outperform 
fractal  and  wavelet  fractal  compression  methods  by  pure  PSNR  to  compression  ratio  performance.  Thus  we  will  only  encode  those 
regions  of  the  image  where  detailed  object  analysis  is  required  and  leave  the  rest  to  be  coded  by  traditional  wavelet  entropy  meth- 
ods[5],  [15].  In  our  application  we  encode  the  region  around  Lenna’s  face  with  our  encoding  method  .  We  maintain  high  quality 
around  the  facial  region  by  compressing  at  a  lower  compression  ratio  around  the  face  than  in  the  rest  of  the  image.  We  set  our 
quantization  procedure  Q  by  which  then  specifies  the  objects  that  we  encode.  We  quote  two  metrics  for  image  PSNR  quality.  One 
for  the  region  within  Lenna  and  one  for  the  overall  image  PSNR.  Compression  ratio  is  given  for  the  overall  image.  Our  results 
show  that  our  overall  image  quality  is  less  that  that  of  traditional  methods  but  at  the  higher  compression  ratios  the  image  quality 
within  the  region  of  interest  of  our  object  surpasses  that  of  the  conventional  methods.  Thus  by  combining  compression  and  recog¬ 
nition  in  one  operation  we  can  preserve  image  quality  in  those  regions  which  have  particular  interest  to  the  user  and  demphasize 
the  background.  This  region  of  the  image  is  now  content  addressable  for  recognition  purposes  in  the  compressed  domain  given  that 
its  compressed  data  (namely  b  values)  were  recognized  by  our  quantization  scheme  Q  such  that  they  are  the  model  Markov  struc¬ 
ture  S^. 


Compress  Ratio: 
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8:1 

16:1 

32:1 

64:1 
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33.7 

32.2 

31.5 
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Image  PSNR(db; 

:  39.0 

36.2 

33.2 

30.2 

27.54 

JPEG  PSNR(db) 

:  44.0 

38.1 

33.0 

28.5 

- 

EZW  PSNR(db) 

- 

39.6 

36.3 

33.17 

30.23 

Figure  1.  Compression  Ratio  vs.  PSNR  on  512x512  Lenna  8bpp  (preliminary) 


X.Conclusion 


The  fractal-wavelet  method  offers  a  significant  improvement  over  existing  techniques  because  of  its  unified  approach 
to  image  analysis  and  compression.  The  fractal  wavelet  method  itself  gives  naturally  higher  compression  and  better  reproductive 
quality  than  conventional  DCT-based  methods  for  specific  regions  of  interest.  By  its  wavelet  frequency  division  process,  it  gives  a 
more  natural  organization  to  existing  fractal  methods  and  allows  more  accurate  block  matching.  As  a  result  of  its  modulus  maxima 
shape  representation  it  gives  a  shape  to  texture  content-based  approach  to  compressed  file  organization.  By  its  gradient  based 
block  matching  technique  it  is  significantly  faster  than  existing  wavelet-fractal  compression  methods. 
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Abstract 

Utilizing  coherent  pulse  Doppler  radar  waveforms  and  feature  extraction  signal 
processing  techniques,  a  rotary  -wing  aircraft’s  rotor  design  such  as  main  rotor 
configuration  (single,  twin  tandem,  twin  coaxial,  etc),  number  of  blades  (2, 3, 4, 5,  etc), 
tail  rotor  blade  count  and  configuration  (cross,  X,  star,  etc)  can  be  determined.  Such 
information  can  be  used  to  assist  in  the  classification  and  identification  of  the  aircraft. 
This  paper  describes  the  development  of  a  high  fidelity  coherent  pulse  Doppler  radar  time 
domain  signature  simulator  for  military  rotary-wing  aircraft  targets.  The  simulator 
model’s  radar  cross  section  (RCS)  backscatter  of  the  rotary- wing  aircraft’s  airframe,  main 
rotor  blades,  main  hub  section,  and  tail  rotor  blades  as  a  function  of  time.  The  simulator 
also  has  models  for  simple  clutter  and  noise,  which  can  be  added  to  the  target  return  at 
any  desired  signal-to-clutter  (S/C)  and/or  signal-to-noise  (S/N)  levels. 

1.  The  Doppler  Signature  of  a  Rotary- Wing  Aircraft 

Figure  1  illustrates  a  typical  military  helicopter  in  forward  flight.  The  main  rotor  blades 
are  rotating  in  a  clockwise  direction  as  viewed  from  the  top,  with  the  tail  rotor  blades 
rotating  in  the  plane  of  the  drawing  from  top-to-bottom. 


Figure  1.  Helicopter  in  Forward  Flight 
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If  we  assume  that  a  pulse  Doppler  radar  is  viewing  the  aircraft  of  Figure  1  from  the  left 
along  the  flight  path,  the  helicopter  will  exhibit  several  Doppler  signatures  to  the  radar 
processor.  These  will  include  positive  Doppler  shifts  of  the  airframe,  the  advancing  main 
and  tail  rotor  blades,  and  the  advancing  elements  of  the  hub  region.  Negative  Doppler 
shifts  will  result  from  the  retreating  blades  and  retreating  hub  region  scatters. 

Figure  No.  2  shows  the  complex  magnitude  time  domain  signature  from  the  return  of  an 
actual  military  helicopter  including  noise  and  clutter.  Figure  3  illustrates  the  power 
spectrum  density  of  the  signal  (16384  point  complex  FFT).  The  recording  pulse  Doppler 
radar  was  aboard  an  instrumented  aircraft  in  flight,  thus  the  shifted  clutter  line.  The 
helicopter  was  on  an  approaching  profile  in  forward  flight,  resulting  in  additional 
Doppler  shift  of  the  airframe  (Le.  skin).  Centered  around  the  helicopter  airframe  is  the 
hub  region  returns  and  the  advancing  &  retreating  blade  returns. 


xio* 

Figure  2.  Magnitude  Time  Domain  Data  of  RW  Aircraft  in  Clutter  and  Noise 


Figure  3.  PSD  of  Time  Domain  Signal 


Feature  extraction  of  the  radar  signal  can  be  achieved  by  selecting  only  the  regions  in  the 
PSD  that  are  of  interest  (i.e.  skin,  hub,  blade,  etc),  and  performing  an  inverse  complex 
FFT  to  return  the  data  to  the  time  domain.  Figure  4  illustrates  such  a  process  for  the 
advancing  blade  flash  region. 


Figure  4.  Time  Domain  -  Advancing  Blades 

Analysis  of  Figure  4  reveals  one  main  rotor  blade  “flash”,  two  main  rotor  sweep-tip 
“flashes”,  and  eight  tail  rotor  “flashes”.  This  information,  combined  with  data  from  the 
retreating  blade  spectrum,  can  significantly  characterize  a  given  rotary-wing  aircraft. 

2.  Rotary- Wing  Signature  Simulator 

Figure  5  illustrates  the  coherent  time-domain  rotary- wing  radar  backscatter  signature 
simulator  developed  at  Georgia  Tech  Research  Institute.  Helo-Sim  consists  of  models 
for: 

1.  advancing  main  rotor  blades 

2.  advancing  tail  rotor  blades 

3.  retreating  main  rotor  blades 

4.  retreating  tail  rotor  blades 

5.  hub  signature  (both  advancing  and  retreating) 

6.  airframe  (skin  signature) 

7.  white  noise 

8.  distributed  clutter 

9.  radar  waveforms 


10.  3-D  dynamic  environment  for  the  radar  platform  and  rotary-wing  target 


Coherent  l/Q  Time  Domain 


Figure  5.  Pulse  Doppler  Radar  Rotary-Wing  Target  Signature  Simulator 

The  blade  modeler  generates  returns  based  on  the  physical  and  dynamic  properties  of  the 
main  rotor  and  tail  rotor  blades  of  the  aircraft.  Variables  include;  blade  length,  major 
axis,  minor  axis,  composite  or  all-metal,  sweep-tip  design  of  the  main  rotor  blade, 
number  of  blades  (main  and  tail),  configuration  of  the  tail  rotors  (cross,  X,  star,  etc),  and 
the  rotation  rate  of  the  main  and  tail  sections.  The  skin,  noise,  and  hub  models  are  based 
on  probability  estimators  with  user  defined  mean  a  variance  settings.  The  radar  model 
defines  the  radar  wavelength  and  pulse  repetition  frequency. 

Figure  6  is  several  complex  time  domain  magnitude  signatures  generated  by  Helo-Sim 
for  a  hypothetical  4-blade  main  rotor/2-blade  tail  rotor  helicopter.  Subplots  (a),  (b),  (c), 
and  (d)  illustrates  the  forward  blade,  hub,  skin,  and  retreating  blade  signatures 
respectfully. 

Figure  7  shows  the  composite  complex  time  domain  signature  for  the  complete  rotary¬ 
wing  signature  with  added  noise.  Figure  8  illustrates  the  complex  magnitude  FFT  for  the 
signal  of  Figure  7.  Review  of  Figure  7  indicates  the  helicopter’s  skin  line,  hub,  and  blade 
spectrums. 


3.  Summary 

Researchers  at  Georgia  Tech  have  developed  a  high  fidelity  pulse  Doppler  radar  signature 
simulator  for  rotary-wing  aircraft.  Unique  properties  of  a  given  helicopter,  such  as  its 
main  rotor  blade  design,  tail  rotor  design,  and  hub  structure  can  be  characterized. 
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Simulated  time  domain  signatures  from  the  modeled  helicopter  can  be  presented  to 
special  radar  processors  for  testing  information  to  a  non-cooperative  target  recognizer. 


(W:Hub 


(d)  Retreating  Blades 


Figure  6.  Helo-Sim  Generated  R/W  Time  Domain  Signatures 
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Figure  7.  Composite  R/W  Time  Domain  Signature  (Helo-Sim) 


Comparison  of  Selected  Features  for  Target  Detection  in 
Synthetic  Aperture  Radar  Imagery 

Tristrom  Cooke*  Nicholas  J.  Redding^  Jim  Schroeder^  Jingxin  Zhang^ 


Abstract 

Several  methods  are  available  that  capture  the  statistics  of  radar  imagery.  The  best  features,  in  the  sense 
of  mctti  made  target  discrimination,  cire  expected  to  be  different  for  different  types  of  naturad  background,  aind 
for  different  objects  of  interest  such  cis  vehicles.  We  demonstrate  that  discrimination  of  naturaJ  background 
and  man  made  objects  using  low  resolution  Synthetic  Aperture  Radar  imagery  is  possible  using  multiscade 
autoregressive  (MAR),  multiscale  autoregressive  moving  average  (MARMA)  models,  aind  singular  value 
decomposition  (SVD)  methods.  We  use  the  model  coefficients,  moments  of  the  model  residual  vectors,  a 
subset  of  eigenvectors,  and  moments  of  the  selected  eigenvectors,  as  featiues  for  target  discrimination.  All 
the  test  imagery  used  here  wais  1.5  metre  resolution. 

Keywords:  multiscale  models,  singular  value  decomposition,  automated  target  detection,  synthetic  aperture 
radar,  natural  background. 


1  Introduction 


Characterising  the  natural  background  or  clutter  environment  in  synthetic  aperture  radar  (SAR)  imagery  is  an 
important  step  in  developing  better  automated  target  detection  (ATD)  tools.  The  ability  to  discriminate  one 
type  of  background  from  another  can  lead  to  the  use  of  adaptive  ATD  algorithms  which  go  beyond  merely  looking 
for  radar  bright  objects.  These  models  capture  the  variation  in  the  statistical  properties  of  a  given  region  in  an 
image.  Natural  backgrounds  exhibit  different  statistics  than  man  made  objects,  which  we  exploit  as  a  means  of 
discrimination.  In  this  work  we  use  Multiscale  Autoregressive(MAR)  models,  Multiscale  Autoregressive  Moving 
Average  (MARMA)  models,  and  Singular  Value  Decomposition  (SVD)  techniques  to  derive  features  sensitive 
to  the  statistical  differences  between  background  clutter  and  man  made  objects  such  as  vehicles. 


2  Multiscale  Modeling  Methods 


Multiscale  methods  have  been  applied  to  the  problem  of  target  detection  and  recognition  in  SAR  imagery  by 
Irving  et  al  [1]  and  Subotic  et  al  [2].  These  methods  differ  from  more  traditional  image  analysis  methods  in 
that  they  operate  on  a  sequence  of  related  images,  each  being  a  view  of  the  same  scene  at  a  different  resolution, 
rather  than  a  single  high-resolution  image.  In  our  work,  this  sequence  of  images  is  organised  as  a  hierarchical 
multiscale  stack,  where  the  resolution  decreases  from  fine  to  coarse  by  a  fixed  factor  progressively  up  through 
the  levels  of  the  stack.  The  models  we  impose  on  these  stacks  are  generalisations  of  those  used  in  ordinary  time 
series.  In  this  context,  the  level  number  (or  scale  index)  plays  the  equivalent  role  to  time. 
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In  [3]  and  [4],  it  was  shown  that  multiscale  autoregressive  (MAR)  models  are  effective  in  discriminating 
vehicle-sized  objects  from  natural  background  in  SAR  imagery.  Here  we  further  quantify  target  discrimination 
performance  using  receiver  operating  charax;teristic  (ROC)  curves  estimated  from  a  larger  data  set.  We  also  con¬ 
sider  the  wider  class  of  multiscale  autoregressive  moving  average  (MARMA)  processes.  These  are  generalisations 
of  ARMA  models  in  time  series  analysis,  and  they  contain  MAR  processes  as  a  special  case. 


A  multiscale  stack  is  a  sequence  of  images  indexed  by  level  number  m  =  0, . . . ,  M,  where  M  is  a 

positive  integer.  Level  0  is  the  highest  resolution  image,  and  the  resolution  decreases  by  a  factor  of  two  in  going 
from  each  level  to  its  successor.  The  set  of  pixels  at  level  m  is 

=  {{ij)  :  i  =  0, . . . , im  -  1, i  =  0, . .  .,jm  -  1}  • 

The  state  of  pixel  (i,  j)  in  is  ,  a  random  variable  which  may  be  real  or  complex.  Every  pixel  in 
every  level  except  M  has  a  parent  pixel  in  the  level  above  it.  The  parent  of  pixel  {i,j)  in  level  m  is  the  pixel 
(idiv2,jdiv2)  in  level  m -f  1,  where  div  is  the  integer  division  operator  (i.e.  the  remainder  is  discarded).  The 
ancestor  of  (i,  j)  at  level  m  +  k  (provided  m  +  k  <  M)  is  the  pixel  (i  div  2'^,j  div  2*).  Every  pixel  in  every  level 
except  the  first  has  four  children  in  the  level  below  it.  The  children  of  {i,j)  in  5^”*^  are  the  four  pixels  {2i,  2j), 
{2i,2j+l),  {2i+l,2j)  and  (2i-l- 1, 2i -f  1),  all  in  Note  that  =  2-*io  and  and  that  both 

io  and  jo  must  be  greater  than  or  equal  to  2^. 


The  multiscale  stacks  used  in  this  work  were  generated  using  coherent  quadtree  averaging  and  by  use  of  a 
discrete  wavelet  transform;  no  significant  differences  between  the  two  methods  have  been  noted  so  far.  The 
input  to  this  process,  a  complex  image  of  a  region  of  interest  at  the  highest  resolution  available,  becomes  level 
0  of  a  complex  multiscale  stack  m  =  0, . . . ,  M.  Each  of  the  higher  levels  is  generated  from  the  level 

immediately  below  it  with  the  resolution  decreasing  by  a  factor  of  two  at  each  successive  level.  The  state  of 
each  pixel  at  each  level  above  the  first  is  given  by  the  arithmetic  mean  of  the  states  of  its  children.  So 


7(">)  _  W 7(m-l) 

-  4  V^2i,2j 


I  7(^-1)  I  (  7' 

+  •^2i.2i  +  l  +  ■^2i+l,2j  ^  ^2i+l,2j+l 


)■ 


Once  the  complex  stack  has  been  generated,  each  level  is  converted  to  decibels  (log  detection).  This  procedure 
produces  a  second  multiscale  stack  X  =  m  =  0, . . . ,  M,  which  is  the  input  to  the  model  fitting  and 

classification  procedures. 


A  MAR  process  defined  on  a  multiscale  stack  X  satisfies 


R(m) 

/  J  “fc  div  2'‘  ,j  div  2'‘  ^ ’ 
k=l 


(1) 


where  i?(m)  is  the  order  of  the  process  for  level  m,  is  zero  mean  noise,  and  the  coefficients  . . . , 
are  real  valued  model  parameters.  An  intercept  term  can  be  added  to  each  level  if  required.  The  same  coefficients 
apply  to  all  pixels  on  the  same  level,  but  the  coefficients  for  different  levels  need  not  be  the  same.  For  each  level  m, 
the  are  independent  and  identically  distributed,  i.e.  (i,j)  €  is  a  spatially  stationary, 

strictly  white  noise  field.  We  further  assume  that  the  noise  fields  of  two  different  levels  are  independent. 


To  define  a  MARMA  process  on  a  multiscale  stack  X,  we  simply  augment  the  above  process  like  so; 


X: 


R(m) 

(»")  _  y (>"+*)  I  ^  f,(m)Am+k)  -i- 

iJ  7  J  k  i  div  ,  j  div  '  7  ^  k  ^i  div  2^ ,  j  div  2^ 

kzzl 


(2) 


Here  b^\  . . coefficients  of  a  multiscale  moving  average  process. 

The  MAR  process  has  a  likelihood  function  which  readily  decomposes  into  a  product  of  terms  for  each 
pixel  and  is  amenable  to  computation.  This  makes  likelihood  approaches  to  model  fitting  and  to  classification 
straightforward,  as  demonstrated  in  [2].  By  contrast,  the  likelihood  function  for  MARMA  is  unwieldy.  Iterative 
least  squares  techniques  need  to  be  used  for  model  fitting,  and  likelihood  based  classification  is  problematic.  We 
address  this  last  issue  by  trying  to  use  the  fitted  coefficients  under  the  model  as  the  input  to  the  classification 
procedure,  rather  than  trying  to  classify  on  the  image  data  directly. 
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3  Singular  Value  Decomposition  Method 


An  SVD  feature  approach,  unproven  and  still  in  relative  infancy,  involves  the  analysis  of  high  range  resolution 
(HRR)  profiles  [10,  5]  for  ATD/R  systems.  Singular  Value  Decomposition  (SVD)  analysis  of  HRR  data  reveals 
that  the  range-space  eigenvectors  corresponding  to  the  largest  singular  value  accounts  for  more  than  90  percent 
of  target  energy,  thus  it  may  be  that  the  eigenvectors  can  be  used  as  a  target  detection  statistics.  HRR  analysis 
has  so  far  concentrated  on  ATR  so  it  is  not  clear  what  value  may  accrue  to  ATD  algorithms  based  upon  this 
approach.  For  a  SAR  image  the  analogy  would  be  to  use  sequences  of  cross  track  pixels  from  a  region  of  interest. 

The  motivation  for  interest  in  HRR  SVD  techniques  within  the  USAF  is  an  attempt  to  bridge  the  gap 
between  MTI  mode  and  SAR  stripmap  mode  of  wide  area  surveillance  systems.  MTI  is  good  at  detecting 
moving  targets,  but  ineffective  against  stationary  targets.  SAR  likewise  is  ineffective  against  moving  targets 
but  capable  of  accurately  imaging  stationary  targets.  It  is  possible  that  the  SVD  produces  a  useful  feature 
space  for  SAR  imagery  not  previously  exploited. 

A  possible  approach  can  be  outlined  as  follows.  Let  X  be  a  size  N  x  M  matrix  containing  a  region  of  interest 
from  a  detected  or  complex  SAR  image.  This  would  consist  of  N  range  bins  or  cross  track  pixels  and  M  along 
track  focussed  strips.  The  SVD  decomposition  results  in 

U  =  Eigenvectors  of  [XX"^]  =  [wi,  tt2,  ...wjv], 

V  =  Eigenvectors  of  \X'^ X\  =  [vi,V2,  --vm], 

A  =  d*a(?[Aii,  A22,  ...Amm]- 

The  eigenvectors  «,•  correspond  to  the  magnitude  ordered  eigenvalues  A,- ,  and  represents  energy  from  a  target 
range  profile  (cross  track  pixels).  Likewise,  eigenvectors  Vi  span  the  along  track  subspace.  It  is  predicted  that 
the  target  energy  is  concentrated  in  eigenvectors  wi  and  v^.  If  the  region  of  interest  contains  clutter  only,  it 
is  predicted  that  the  eigenvectors  are  evenly  distributed  in  their  components.  A  simple  linear  discriminant  or 
other  classifier  can  then  be  designed  to  classify  a  region  as  target  or  clutter  only. 


4  Other  Selected  Features/Discriminators 


Along  with  the  features  previously  discussed,  numerous  other  features  for  magnitude  only  data  such  as  maximum 
target  intensity,  Karhunen  Loeve  Transform  (KLT)  based  methods  and  background  distribution  models  (such 
as  the  U,V  and  W  measures  for  K-distribution  parameter  estimation  described  in  [8])  have  also  been  considered. 
After  extensive  testing  over  a  large  and  realistic  database  containing  53961  background  and  22084  target  samples, 
it  was  found  that  the  best  single  feature  found  to  date  for  target  detection  is  the  V  measure  for  the  estimation 
of  the  parameters  of  the  K  distribution. 

A  number  of  co-occurrence  matrix  features  have  already  been  described  in  Redding  [9].  Lie  [7]  contains 
another  set  of  co-occurrence  based  texture  features,  such  as  the  ‘busyness’,  Weber’s  contrast,  ‘average  contrast’ 
and  ‘homogeneity’.  This  paper  also  outlines  an  algorithm  for  efficiently  calculating  these  meeisures. 

There  are  many  different  ways  in  which  an  image  can  be  decomposed  into  a  weighted  sum  of  orthonormal 
basis  functions,  where  the  weights  can  be  used  as  features.  Ghosal  and  McKee  [6]  for  instance  use  a  Zernike  basis, 
while  Rong  and  Bhanu  [11]  derive  three  features  based  on  the  means  and  moments  of  an  image’s  coefficients  in 
a  Gabor  basis. 

A  set  of  features  that  seems  very  promising  for  target  detection  is  the  SVD  of  the  FFT.  Although  no  firm 
physical  explanation  has  been  found  for  this,  the  first  row  of  the  first  matrix  produced  by  the  SVD  yields  a 
reasonably  good  ROC  curve  as  shown  in  Table  3.  The  best  5  other  features  (maximum  intensity,  one  FFT 
coefficient,  variance,  skewness  and  a  parameter  derived  from  the  KLT)  gave  a  similarly  good  ROC.  Choosing 
the  best  8  features  from  all  of  the  features  tested  (maximum  intensity,  4  SVD  of  FFT  coefficients,  one  FFT 
coefficient  and  the  skewness)  gave  a  FAR  of  about  7.8  percent  for  a  PD  of  90  percent. 
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Clearly,  Automatic  Target  Detection  can  be  considered  as  a  form  of  target  classification,  wherein  just 
two  classes  exist;  Target  only  or  Background  Clutter  only.  It  may  be  advantageous  to  approach  ATD  as 
a  pattern  classification  problem  and  explore  the  use  of  modern  classifiers  such  as  Support  Vector  Machines 
(SVMs)  or  Neural  Networks  (NNs)  to  name  just  two  possible  techniques.  The  potential  use  of  SVMs  for  target 
discrimination  is  summarised  in  [9] . 


5  Results 


The  multiscale  models  described  in  the  previous  section  were  each  fitted  to  samples  of  “Background,”  “Grass¬ 
land,”  “Woodland,”  and  “Vehicles”  extracted  from  1.5m  resolution  SAR  imagery.  Additionally,  eigenvectors 
from  an  SVD  were  computed  from  the  magnitude  imagery.  The  imagery  was  collected  during  trials  of  DSTO’s 
Ingara  airborne  SAR.  There  were  53  samples  of  the  background  class,  256  samples  of  homogeneous  grass,  256 
samples  of  woodland,  and  159  samples  of  the  same  vehicle  from  different  orientations.  Each  sample  was  a  16  x  16 
matrix  of  complex  SAR  data  which  was  processed  into  a  four  level  stack.  Model  fitting  was  accomplished  by 
transforming  each  stack  into  a  256  x  4  data  matrix  suitable  for  use  as  input  into  the  standard  time  series  fitting 
routines  in  MatLab. 

For  this  work,  we  only  estimated  the  coefficients  for  level  0  of  the  multiscale  stack.  Four  different  models 
were  fitted  to  the  SAR  data: 


•  a  second  order  autoregression  denoted  MAR(2); 

•  a  second  order  autoregression  denoted  MAR(3); 

•  a  fourth  order  autoregression,  MAR(4);  and 

•  a  second  order  autoregressive  moving  average  denoted  MARMA(2,2). 


Table  1  summarises  the  performance  of  the  multiscale  techniques  using  the  area  under  the  ROC  curves  derived 
from  coeflScients  and/or  residual  vector  moments  under  several  multiscale  model  cases.  In  most  cases,  the 
coefficients  vary  quite  widely  and  the  two  populations  overlap  to  varying  extents.  However,  linear  discrimination 
works  (albeit  with  some  error)  for  certain  model  sizes.  Several  model  orders  are  invoked:  MAR(2),  MAR(3), 
MAR(4),  and  MARMA(2,2),  over  regions  of  interest  (ROI)  sized  8  x  8,  16  x  16,  and  32  x  32.  The  complex 
multiscales  are  created  either  by  complex  quadtree  averaging  or  by  use  of  a  “dbl”  wavelet  transform;  the  results 
are  similar.  We  caution,  however,  that  the  background  clutter  is  reasonably  homogeneous  without  additional 
cultural  features,  and  that  the  targets  are  comparatively  strong.  In  all  cases  the  targets  are  multiple  images  of 
the  same  vehicle  from  different  orientations. 

Quadratic  discriminant  rules  were  also  formulated  from  the  training  data  for  each  of  the  multiscale  models. 
Classifiers  based  upon  these  rules  performed  poorly  relative  to  the  linear  discriminant  rules.  Examination  of  the 
scatterplots  indicate  a  possible  reason  for  this.  The  fitted  coefficients  of  each  class  are  quite  widely  dispersed 
with  plenty  of  outliers.  As  a  result,  the  sample  covariance  matrices  needed  for  quadratic  discriminators  would 
be  poor  estimators  of  the  underlying  population  covariance  matrices.  We  are  currently  exploring  the  use  of  a 
Support  Vector  Machine  (SVM)  as  an  alternative  to  linear  discriminant  classifiers. 

Typical  results  from  use  of  SVD  derived  features  (over  16  x  16  region  size)  for  the  target  discrimination 
task  are  shown  in  Table  2.  Specifically,  area  under  the  ROC  curve  using  both  the  ui  and  vi  eigenvector,  or 
32  coeflScients  total  input  into  the  linear  discriminant  is  shown,  and  results  from  using  just  the  moments  of  the 
ui  and  ui  eigenvectors.  It  can  be  seen  that  using  the  variance  of  the  first  two  eigenvectors  produced  a  perfect 
ROC  curve.  However,  recall  that  these  results  are  obtained  on  “ideal”  data  in  that  the  background  clutter  is 
homogeneous,  the  target  signal  strength  is  high.  It  can  be  seen  from  Table  3  that  testing  on  low  contrast  data 
results  in  an  obvious  degradation  in  target  discrimination  performance. 

For  the  features  used  in  Table  3  which  are  based  only  on  the  magnitude  of  the  radar  return,  a  larger  data 
set  with  more  realistic  contrast  levels  wcis  available  for  testing.  This  data  set  was  extracted  from  a  large  image 


4 


40 


MAR  Model 

Area 

Comments /Features 

MAR(2) 

MAR(2) 

MAR(2) 

MAR(3) 

MAR(4) 

MAR(4) 

.560 

.928 

.922 

.910 

.999 

.990 

159  Vehicles/Grciss,  8x8  Regions,  MAR  Coefficients 

159  Vehicles/53  Backgrounds,  16  x  16  Regions,  Intensity  Data,  Peak  Freq  Difference 

159  Vehicles/53  Backgrounds,  16  x  16  Regions,  Intensity  Data,  Min  Freq  Location 

159  Vehicles/Grciss,  16  x  16  Regions,  MAR  Coefficients 

159  Vehicles/Grass,  16  x  16  Regions,  3  MAR  Coefficients/3  Residual  Vector  Moments 

159  Vehicles/Gr2iss,  16  x  16  Regions,  4  Residual  Vector  Moments 

MARMA(2,2) 

MARMA(2,2) 

MARMA(2,2) 

MARMA(2,2) 

MARMA(2,2) 

MARMA(2,2) 

MARMA(2,2) 

.998 

.990 

.990 

.980 

.970 

.940 

.930 

159  Vehicles/Greiss,  32  x  32  Regions,  MARMA  Coefficients 

159  Vehicles/ Grass,  16  x  16  Region,  MARMA  Coefficients/ Variance  of  Residual  Vector 
300  Vehicles/Grciss,  16  x  16  Region,  MARMA  Coefficents/ Variance  of  Residual  Vector 
327  Vehicles/Grass,  16  x  16  Region,  MARMA  Coefficents/ Variance  of  Residual  Vector 
159  Vehicles/ Woods,  16  x  16  Region,  MARMA  Coefficents/ Variance  of  ResiducJ  Vector 

159  Vehicles/ Woods,  16  x  16  Region,  MARMA  Coefficents/ Veiriance  of  Residual  Vector 

159  Vehicles/Woods,  16  x  16  Region,  MARMA  Coefficents/ Variance  of  Residual  Vector 

Table  1;  Estimated  area  under  a  ROC  curve  for  linear  discriminant  applied  to  the  features  derived  from 
multiscale  models 


SVD  Vectors 

Area 

Comments/Features 

«l,Wl 

.994 

159  Vehicles/53  Backgrounds,  8x8  Region,  ui  eind  vi  Used 

«l,Ul 

.964 

159  Vehicles/53  Backgroimds,  8x8  Region,  Meein  of  «i  ,vi 

Ul.Wl 

.964 

159  Vehicles/53  Backgrounds,  8x8  Region,  Variance  of  ui,t;i 

Ul,t>l 

.836 

159  Vehicles/53  Backgrounds,  8x8  Region,  Skewness  of 

Ul,tll 

.533 

159  Vehicles/53  Backgrounds,  8x8  Region,  Kurtosis  of  ui,i>i 

Ui,Vi 

.993 

159  Vehicles/53  Backgroimds,  16  x  16  Region,  ui  and  vi  Used 

Ui,Vl 

.926 

159  Vehicles/53  Backgrounds,  16  x  16  Region,  Mean  of  ui,ui 

Ul,Vl 

.927 

159  Vehicles/53  Backgrounds,  16  x  16  Region,  Vzuriance  of  «i,vi 

Ul,Ul 

.888 

159  Vehicles/53  Backgrounds,  16  x  16  Region,  Skewness  of 

.720 

159  Vehicles/53  Backgroimds,  16  x  16  Region,  Kurtosis  of  ui,t;i 

Table  2:  Estimated  area  under  a  ROC  curve  for  linear  discriminant  applied  to  the  features  derived  from  an 
SVD 


into  which  were  inserted  512  real  targets  at  a  variety  of  contrasts.  A  prescreening  algorithm  was  applied  to 
this  large  image,  which  resulted  in  the  extraction  of  76045  64  x  64  images  (although  only  an  8  x  8  subset  of 
this,  corresponding  roughly  to  the  target  pixels,  was  used  for  the  calculation  of  the  features).  53961  of  these 
contained  background,  while  22084  contained  inserted  targets.  This  larger  data  set  is  much  more  challenging 
than  the  complex  data  set,  as  can  be  seen  by  a  comparison  of  the  area  under  the  ROC  curves  for  the  Maximum 
Intensity,  which  was  0.974  for  the  smaller  complex  data  set,  but  0.845  for  the  new  data  set. 


6  Conclusions 


Our  results  show  that  linear  discrimination  of  three  types  of  natural  background  in  SAR  imagery,  “woodland,” 
“background,”  and  “grassland”,  from  man  made  bright  objects  is  possible  using  16  x  16  samples  when  the 
classification  is  made  using  the  fitted  coefficients  and  residual  vector  moments  under  one  of  several  multiscale 
models.  The  autoregressive  component  of  “grassland”  is  first  order  on  average,  while  that  of  “woodland”  is 
second  order.  Adding  a  first  order  moving  average  component  improves  the  classification  accuracy  beyond  that 
attainable  using  pure  multiscale  autoregression.  The  data  are  very  clean,  thus  more  extensive  testing  of  the 
MAR  and  MARMA  method  on  low  resolution  SAR  data  would  be  necessary  before  conclusions  can  be  made. 

The  SVD  methods  worked  exceptionally  well  on  the  clean  data,  especially  using  the  variance  of  just  two 
eigenvectors  as  a  two  element  feature  vector.  Unfortunately,  the  results  degrade  significantly  when  tested  against 
lower  contrast  and  higher  clutter  data.  The  SVD  still  produced  better  discrimination  than  most  other  individual 
features  on  this  data. 
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SVD/K/FFT 

Area 

Comments /Features 

K  Ptirameters 

.821 

All  Clutter  and  Contrasts,  8x8  Region,  U  Measure  Used  as  feature 

K  Parameters 

.851 

AU  Clutter  and  Contrasts,  8x8  Region,  V  Measure  Used  as  Feature 

K  Parcimeters 

.795 

All  Clutter  eind  Contrasts,  8x8  Region,  W  Measure  Used  as  Feature 

Ml 

.726 

All  Clutter  cind  Contrasts,  8x8  Region,  Variance  of  ui 

Vl 

.840 

All  Clutter  and  Contrasts,  8x8  Region,  Variance  of  m 

«i 

.841 

All  Clutter  and  Contrasts,  8x8  Region,  Eigenvector  «i  Used  as  Feature 

Vl 

.841 

All  Clutter  and  Contrasts,  8x8  Region,  Eigenvector  vi  Used  as  Feature 

Not  Applied 

.845 

All  Clutter  and  Contrasts,  8x8  Region,  Max  Intensity  Used  as  Feature 

Not  Applied 

.974 

786  Vehicles/4096  Backgrounds,  8x8  Region,  Max  Intensity  Used  as  Feature 

FFT 

.778 

All  Clutter  and  Contrasts,  8x8  Region,  Read  f  f  1  (Joelhaent  Used 

FFT 

.926 

786  Vehicles/4096  Backgrounds,  8x8  Region,  Real  FFT  Coefficient  Used 

K/FFT/SVD 

.901 

All  Clutter  and  Contrasts,  8x8  Region,  5  Best  features  Combined 

SVD/FFT 

.915 

AU  Clutter  2ind  Contrasts,  8x8  Region,  8  SVD  of  FFT  Used  as  Featiure 

SVD-FFT/K 

.967 

786  Vehicles/4096  Backgrounds,  8x8  Region,  9  Best  features  Combined 

Table  3:  Estimated  area  under  a  ROC  curve  for  linear  discriminant  applied  to  the  features  derived  from  an 
SVD,  K  Distribution  Parameters,  and  FFT  Coefficients  Using  a  Range  of  Contrast  and  Clutter  Ratios 
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In  this  paper,  we  consider  turbo  equalization  for  fast-fading  frequency- 
selective  channels  when  perfect  channel  state  inforraation  is  not  available 
at  the  receiver.  The  receiver  performs  channel  estiraation/equalization  and 
decoding  in  an  iterative  fashion  using  soft-decisions.  Simulation  studies  of 
a  turbo  equalizer  based  on  our  recently  proposed  single  stage  joint  channel 
estimation  and  equalization  algorithm  demonstrate  the  impressive  perfor¬ 
mance  of  turbo  equalization. 
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I.  INTRODUCTION 

Transmission  channels  of  modern  high  data  rate,  high  mobilility,  digital  commu¬ 
nication  systems  often  exhibit  both  fast-fading  and  intersymbol  interference  (ISI) 
characteristics.  As  a  consequence,  equalization  can  sometimes  seem  an  overwhelm¬ 
ingly  difficult  task.  Nevertheless,  new  turbo  equalizers  have  been  developed,  and 
have  proved  to  be  very  effective  in  compensating  for  both  the  fast-fading  and  the 
ISI  effects  in  these  channels. 

Turbo  processing  originally  arose  in  the  context  of  channel  coding  for  error  correc¬ 
tion  [2]  in  additive  white  Gaussian  noise  (AWGN)  channels.  The  term  turbo  refers 
to  the  fact  that  the  data  is  processed  multiple  times  in  a  feedback  arrangement 
before  a  final  digital  decision  is  made.  Recently,  turbo  processing  has  been  demon¬ 
strated  in  the  context  of  equalization  for  coded  data  transmitted  over  fast-fading 
ISI  channels  [3,  4,  5].  In  each  of  these  works,  perfect  channel  state  information  has 
been  assumed  to  be  available  to  the  equalizer. 

In  the  absence  of  perfect  channel  state  information,  traditional  (non-turbo)  adap¬ 
tive  equalizers  use  least  mean  squares  (LMS)  or  recursive  least  squares  (RLS)  algo¬ 
rithms  to  directly  update  linear  FIR  equalizer  coefficients  [6,  7].  Explicit  estimation 
of  the  channel  is  avoided,  but  these  equalizers  are  unable  to  adequately  compensate 
for  the  fast-fading  environment. 
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An  alternative  approach  is  to  separately  estimate  the  channel,  and  provide  the 
estimates  on-line  to  an  equalizer.  Channel  estimators  which  learn  and  exploit  the 
statistics  of  the  channel  with  Kalman  filtering  and  prediction  have  been  demon¬ 
strated  to  have  improved  performance  in  the  fast-fading  frequency-selective  envi¬ 
ronment  [8,  9,  10].  For  flat-fading  channels  (i.e.  not  frequency-selective,  no  ISI), 
this  approach  may  be  realized  using  pilot  symbols  and  interpolation  [11]. 

Recently,  a  more  integrated  approach  to  channel  estimation  and  equalization 
has  evolved,  known  as  per-survivor  processing  (PSP)  [12].  Based  on  maximum 
likelihood  sequence  estimation  (MESE),  channel  estimates  are  formed  using  Kalman 
filters  along  each  of  the  surviving  hypothesis  paths  in  the  MESE  trellis.  Thus 
there  is  a  different  channel  estimate  for  each  state  in  the  trellis,  and  the  result 
is  joint  channel  and  data  estimation.  PSP  has  been  demonstrated  to  outperform 
conventional  techniques  in  fast-fading  ISI  channels  [12],  and  reduced  complexify 
algorithms  have  been  proposed  [13,  14,  15,  16].  Unfortunately,  the  PSP  approach 
is  not  suited  to  the  turbo  structure  because  MESE  entails  hard  decisions. 

The  maximum  a  posteriori  (MAP)  algorithm  [17, 18]  has  received  much  attention 
as  an  alternative  to  MESE  for  equalization.  This  is  because  if  provides  optimal  soft 
decisions  in  contrast  to  the  hard  decision  MESE  approach.  When  the  a  posteriori 
probabilities  are  retained  as  soft  decisions,  the  algorithm  is  often  referred  to  as 
the  APP  algorithm  [19].  Unlike  MESE,  there  is  no  concept  of  a  surviving  path  to 
each  state,  and  therefore  where  joint  equalization  and  channel  estimation  is  to  be 
achieved,  channel  estimates  must  be  based  only  on  the  trellis  state.  Our  recent 
work  [1,  20]  provides  a  generalized  framework  for  achieving  this  by  expanding  the 
trellis  state-space.  In  flat-fading  channels,  the  algorithm  reduces  to  that  described 
in  [21,  22]. 

This  paper  focusses  on  incorporating  our  APP  equalizer  [1,  20]  (with  joint  chan¬ 
nel  estimation)  info  a  turbo  processing  structure.  The  receiver  performs  channel 
estimafion/equalization  and  decoding  in  an  iterative  fashion  using  soft  decisions. 
Simulation  studies  of  a  turbo  equalizer  based  on  our  recently  proposed  single  stage 
joint  channel  estimation  and  equalization  algorithm  demonstrate  the  impressive 
performance  of  turbo  equalization. 

2.  TURBO  EQUALIZATION 

In  this  section  we  consider  the  motivation  for  turbo  equalization  and  review 
turbo  processing  principles  in  the  context  of  equalization  for  coded  data  transmitted 
over  fast-fading  ISI  channels.  We  then  describe  a  turbo  equalizer  configuration 
incorporating  our  APP  equalizer  (with  joint  channel  and  data  estimation)  as  well 
as  an  alternative  using  a  single  channel  estimate. 

2.1.  The  Optimal  Receiver 

If  is  well  known  that  the  optimal  receiver  for  any  communication  system  fakes  all 
available  information  (on  the  channel,  modulation  format,  coding,  a  priori  knowl¬ 
edge  of  data  source  statistics,  for  example)  info  account  in  finding  a  joint  estimate 
of  all  the  unknown  parameters,  which  includes  the  transmitted  data.  Even  if  if 
were  possible  to  design  such  a  complicated  receiver,  the  computational  burden 
would  more  than  likely  be  prohibitive.  Furthermore,  modularity  is  important  in 
any  practical  system.  Thus,  from  both  design  and  irapleraenfafion  considerations. 
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FIG.  1.  Transmission  system  with  turbo  equaliza'tion. 


it  is  desirable  to  separate  the  receiver  into  a  cascade  of  processing  subsystems  (or 
stages). 

Prom  an  information  theoretic  point  of  view,  hard  decisions  at  the  output  of  any 
processing  stage  result  in  a  loss  of  information.  Soft  decisions  reflect  confidence  in 
the  decision,  and  this  information  may  be  further  utilized  by  subsequent  processing 
stages.  However,  even  if  soft  decisions  are  passed  between  all  stages,  the  overall 
processing  may  be  far  from  optimal.  This  is  because  later  stages  benefit  from  infor¬ 
mation  derived  at  earlier  stages,  but  not  vice  versa.  This  has  motivated  iterative 
(i.e.  turbo)  processing  at  the  receiver  where  soft  decisions  from  later  stages  (e.g. 
decoding)  are  fed  back  as  a  priori  information  to  earlier  stages  (e.g.  equalization) 
to  refine  decisions. 


2.2.  Turbo  Equalization 

For  illustration  of  turbo  equalization  in  this  paper,  we  consider  the  simplest  con¬ 
figuration  with  two  stages,  i.e.  an  equalization  stage  and  a  decoding  stage.  The  con¬ 
cepts  are  easily  extended  for  turbo  equalization  with  further  stages  (corresponding 
to  outer  coding  or  turbo  coding,  for  example). 

Fig.  1  shows  the  complete  transmission  system.  The  information  available  to  the 
receiver  consists  of  the  channel  output  (in  the  form  of  a  set  of  sufficient  statistics 
from  the  matched  filter  at  the  receiver  front-end),  and  any  a  priori  information  on 
the  transmitted  symbols  (e.g.  pilot  symbols,  zero-tailing). 

The  equalizer  (first  stage)  takes  the  channel  output  and  forms  soft  estimates  of  the 
transmitted  symbols.  The  decoder  (second  stage)  then  uses  these  soft  estimates  and 
knowledge  of  the  coding  algorithm  to  form  soft  estimates  of  the  original  data,  and 
in  doing  so,  forms  a  refined  estimate  of  the  transmitted  symbols.  These  estimates 
are  then  passed  back  to  the  equalizer  as  a  priori  transition  probabilities  for  the 
next  iteration  of  equalization  and  decoding  on  the  same  data  block.  This  feedback 
(iteration)  process  repeats,  either  until  the  data  estimates  converge,  or  until  a 
processing  delay  limit  is  reached. 
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Important  in  the  implementation  of  turbo  processing  is  the  concept  of  extrinsic 
information  for  each  stage.  This  is  the  information  about  the  data  estimates,  which 
is  generated  from  the  processing  at  that  particular  stage.  In  subsequent  turbo 
iterations,  this  extrinsic  information  must  not  be  included  in  the  input  to  that  same 
stage  of  processing  (hence  the  two  subtractions  of  information  at  each  iteration  in 
Fig.  1). 

Formally,  the  information  about  the  n-th  transmitted  symbol  at  the  output  of 
stage  d  may  be  written  in  terms  of  log-likelihood  ratios  as 

Ld,n  =  +  ^d,n  (^) 

where  „  is  the  a  priori  information  (consisting  of  information  available  to  the 
receiver  and  extrinsic  information  contributed  by  other  stages  of  processing),  and 
Lj  „  is  the  extrinsic  information  for  stage  d. 

The  relationship  (1)  holds  only  if  the  soff  decisions  input  to  the  stage  (represented 
by  Lj  n)  independent  over  time  n.  For  the  case  of  a  channel  with  memory 
(due  to  ISI  or  correlation  in  the  fading),  this  will  not  be  true  as  information  at  the 
output  of  the  equalizer  will  be  correlated.  Further,  by  the  very  nature  of  coding,  the 
refined  estimates  of  the  transmitted  symbols  will  also  be  correlated.  Fortunately, 
these  correlations  dominate  symbols  which  are  closely  located,  and  therefore  may 
be  overcome  by  interleaving  the  data  between  coding  and  transmission  through  the 
channel  as  shown  in  Fig.  1. 

2.3.  Joint  Channel  Estimation  &  Equalization 

In  the  absence  of  perfect  channel  state  information  at  the  receiver,  the  APP 
equalizer  forming  the  first  stage  of  the  turbo  configuration  (as  shown  in  Fig.  1) 
can  be  implemented  in  one  of  two  ways.  Firstly,  a  separate  channel  estimator  may 
be  used  to  provide  channel  state  information  to  the  APP  algorithm.  Secondly, 
a  more  sophisticated  joint  channel  estimation  and  equalization  may  be  achieved 
using  an  APP  equalizer  with  expanded  state-space  as  described  in  [1].  The  latter 
configuration  is  the  focus  of  this  paper. 

The  operation  of  the  APP  algorithm  may  be  represented  by  a  state  trellis  [7].  It 
accepts  o  priori  symbol  probabilities  (soft  inputs)  and  produces  a  posteriori  symbol 
probabilities  via  the  forward-backward  recursions  [19].  At  each  time  n,  and  for  each 
state  transition  ij  in  the  trellis,  the  following  expression  needs  to  be  evaluated: 

exp{-^|2:„  -  (2) 

where  Zn  is  the  output  of  the  receiver  matched  filter,  is  the  observation  noise 
variance,  and 

~  ^  ^  ^n—l,ij  fn,i  (^) 

f=0 

where  L  is  the  length  of  the  channel,  {xn,ij}  axe  the  data  hypothesis  values  repre¬ 
sented  by  the  transition  ij  for  which  the  calculation  is  being  made,  and  fn,e  is  the 
time-varying  channel  impulse  response. 

If  the  channel,  fn,i,  is  known  exactly,  or  estimates,  /„/,  are  provided  to  (3) 
by  a  separate  channel  estimation  algorithm,  then  the  state  trellis  for  the  channel 


TURBO  CHANNEE  ESTIMATION  AND  EQUAEIZATION... 


has  states,  where  Q  is  the  size  of  the  modulation  symbol  set  (e.g.  Q  =  8 

for  8-PSK),  and  L  is  the  number  of  taps  in  the  FIR  channel  model  (i.e.  channel 
memory).  The  key  to  achieving  joint  channel  estimation  and  equalization  with  a 
MAP  receiver  is  to  expand  the  state  space  of  the  trellis  to  states,  where 

P  is  the  number  of  additional  hypotheses.  This  expansion  ensures  that  there  is 
enough  extra  memory  in  the  trellis  to  estimate  the  channel  for  each  trellis 
state  i  in  (3).  Minimum  mean  square  error  (MMSE)  estimators  can  then  be  used 
with  these  additional  hypotheses  to  provide  a  different  channel  estimate  for  each 
state  in  the  trellis  [1].  The  channel  estimate  takes  the  form: 

p 

fn,l,i  —  ^n-p  (4) 

p=l 

where  are  the  MMSE  channel  coefficients  corresponding  to  the  hypothesis  of 
state  i. 

Since  channel  estimation  is  performed  as  part  of  the  receiver  signal  processing, 
the  receiver  front-end  is  non-coherent.  When  phase-shift  keying  (PSK)  is  used  for 
the  transmitted  symbols,  pilot  symbols  are  required  to  resolve  the  phase  ambiguity 
within  the  equalizer.  These  pilot  symbols  are  handled  seamlessly  by  the  APP  algo¬ 
rithm.  However,  they  incur  a  penatly  in  SNR  and  reduce  the  effective  bandwidth 
used  to  transmit  the  data.  (Note  that  the  expanded  state  equalizer  [1]  is  quite 
general  in  that  it  is  not  restricted  to  PSK  modulation,  and  can  be  modified  for 
differentially  encoded  tramsraission  [20]). 

3.  SIMULATIONS 

The  performance  of  a  turbo  equalizer  incorporating  our  APP  equalizer  (with  joint 
channel  and  data  estimation)  is  demonstrated  here  by  simulation  of  a  binary  PSK 
system.  At  the  transmitter,  the  message  is  convolutionally  encoded  with  rate  1/2 
(octal  generators  (133, 171),  code  memory  6).  Following  interleaving,  pilot  symbols 
are  added  (1  :  8)  to  the  symbol  sequence  transmitted  over  a  fast-fading  channel. 
Here  we  consider  both  a  flat,  and  a  two-path  (symbol  period  T  spaced)  ISI  fast- 
fading  Rayleigh  channel  with  Doppler  spreading  described  by  Jakes’  spectrum.  The 
normalized  fading  rate,  f^T  =  0.5,  in  both  cases.  The  APP  equalizer  is  symbol 
spaced,  and  uses  P  =  3  for  channel  prediction  of  the  flat  channel,  whilst  in  the 
two-path  case,  a  larger  prediction  order  (here  P  =  6)  is  necessary  to  adequately 
estimate  the  ISI.  In  both  simulations,  up  to  five  turbo  iterations  are  carried  out 
by  the  receiver.  Fig.  2  shows  the  impressive  HER  performance  gains  achievable 
with  turbo  iterations  (numbered  1  to  5).  For  reference,  the  HER  performance  of 
a  non-iterative  equalizer/decoder  with  perfect  channel  state  information  (CSI)  is 
also  shown.  Missing  data  points  are  associated  with  low  HER,  and  are  the  subject 
of  ongoing  (longer)  simulations. 
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The  purpose  of  this  eirticle  is  to  review  the  functionality  of  a  radar  signals 
analysis  patckage  that  has  been  developed  for  the  processing  and  extraction 
of  ELINT  information.  At  the  same  time  we  highlight  some  of  the  technical 
challenges  that  face  a  radar  intercept  receiver  in  collecting  and  analysing 
the  signals  that  make  up  the  modem  radar  signal  environment. 
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1.  INTRODUCTION 

Electronic  intelligence,  or  ELINT,  involves  the  collection  of  signals  intelligence 
through  the  interception  of  non-communications  electromagnetic  emissions.  Sig¬ 
nals  of  interest  may  originate  from  a  variety  of  man-made  electromagnetic  sources 
including  radars,  transponders,  data  links  and  identification  friend  or  foe  (IFF)  [9]. 
However,  our  interest  here  will  be  limited  to  radar-based  ELINT.  Because  of  the 
wideband  nature  of  the  modern  ELINT  receiver,  a  radar  signal  of  interest  (SOI) 
will  often  be  recorded  in  the  presence  of  other,  possibly  many,  interference  signals 
and  the  SOI  must  be  isolated  through  deinterleaving  processing  before  detailed 
parametric  information  can  be  extracted  from  the  signal.  In  the  context  of  radar 
signal  analysis,  deinterleaving  refers  to  the  process  of  isolating  signals  from  a  time 
interleaved  record  of  pulsed  radar  emissions.  In  addition  to  basic  intelligence  gath¬ 
ering  which  involves  collecting,  analysing  and  locating  the  sources  of  the  intercepted 
radar  emissions,  strategic  ELINT  operations  are  used  to  build  radar  emitter  libraries 
or  data  bases.  These  librciries  are  then  employed  in  tactical  missions  by  a  radar 
intercept  receiver  for  emitter  identification  purposes,  including  threat  warning. 

In  this  article  we  provide  an  overview  of  the  functionality  of  a  signals  analysis 
package  dedicated  to  ELINT  processing  and  extraction  and  in  so  doing,  highlight 
some  of  the  technical  challenges  that  face  a  radar  intercept  receiver  in  processing  the 
signals  associated  with  the  modern  radar  signal  environment.  IDEA,  the  Interactive 
Deinterleaver  for  ELINT  Analysis  is  a  software  package  for  laboratory  use  that 
incorporates  a  number  of  tools  that  would  be  familiar  to  an  experienced  ELINT 
analyst,  but  also  includes  several  novel  signal  processing  methods  that  have  been 
developed  by  researchers  at  the  Defence  Science  and  Technology  Organisation. 
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Raw  Data  Featnie  Vector 

FIG.  1.  A  generic  ELINT  signal  classification  system. 


2.  EEINT  PROCESSING  AND  EXTRACTION 

In  the  case  of  pulsed  radar  emissions,  EEINT  processing  and  extraction  is  gen¬ 
erally  carried  out  by  analysing  a  series  of  digital  pulse  descriptor  words  (PDWs) 
that  is  created  by  the  radar  intercept  receiver.  A  PDW  may  be  viewed  as  a  feature 
vector  that  chewacterises  each  pulse  recorded  by  the  receiver  in  a  compact  fashion. 
A  generic  ELINT  system  that  illustrates  this  concept  is  shown  in  Fig.  1.  Typical 
signal  descriptors  include  pulse  angle  of  arrival  (AOA),  radio  frequency  (RF),  pulse 
width  (PW),  pulse  amplitude  (PA),  and  pulse  time  of  arrival  (TOA),  together  with 
intra-pulse  features  such  as  within-pulse  phase  and  frequency  modulation.  Prom 
these  parameters  one  can  deduce  the  pulse  repetition  interval  (PRI)  of  a  signal 
which  corresponds  to  the  time  difference  of  arrival  (TDOA)  between  two  successive 
pulses,  as  well  as  the  scan  period  between  consecutive  radar  sweeps. 

Pulse  AOA  is  an  important  signal  discriminator  for  isolating  a  SOI  since  it  cannot 
be  influenced  by  a  radar  emitter  in  a  deceptive  manner.  Other  parameters,  such 
as  RF  and  PRI,  can  be  and  frequently  are  modified  on  a  pulse-by-pulse  basis  by 
a  modern  radar  source,  either  as  a  means  of  providing  radar  emitter  performance 
enhancement,  or  as  a  method  of  defeating  the  intercept  receiver.  Significantly, 
pulse-by-pulse  signal  agility  has  the  potential  to  confuse  an  intercept  receiver  and 
may  lead  to  signal  fragmentation.  This  cein  be  a  particularly  serious  problem  in 
automated  processing  systems  and  may  result  in  multiple  radar  emitters  being  re¬ 
ported  by  an  intercept  receiver,  rather  than  a  single  emitter  with  parameter  agility. 
A  major  challenge  to  radar  signal  intercept  analysis  is  to  add  an  additional  layer  of 
information  processing  to  the  analysis  sequence  to  correct  for  signal  fragmentation. 

With  the  advent  of  precision  digital  receiver  technology,  electronic  fingerprinting 
can,  in  principle,  provide  a  means  of  assisting  with  the  deinterleaving  process  and 
involves  the  extraction  of  a  unique  signature  from  a  radar  signal.  However,  signal- 
to-noise-ratio  considerations  do  come  into  play  with  these  methods  and  may  inhibit 
the  performance  of  some  of  the  techniques  currently  under  investigation  [5]. 

3.  THE  IDEA  SIGNALS  ANALYSIS  PACKAGE 

In  this  section  we  provide  an  overview  of  the  functionality  of  IDEA  and  concen¬ 
trate  on  recent  enhancements  to  the  package.  An  earlier  review  appeared  in  [2]. 

3.1.  Background 

IDEA  is  a  computer-based  signals  analysis  package  written  in  C-f -t-  and  designed 
to  run  on  Unix  workstations.  As  IDEA’S  name  would  suggest,  there  is  a  strong 
emphasis  on  interactivity  by  the  human  operator  and  this  is  intrinsic  to  IDEA’S 
design  philosophy.  The  interactivity  is  realised  through  a  ramge  of  visualisation  and 
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FI6.  2.  The  main  processing  window  of  IDEA. 


analysis  tools  that  are  made  available  to  the  operator  via  a  graphical  user  interface. 
A  number  of  data  viewers  and  signal  detector  windows  also  update  automatically 
as  a  new  tool  is  invoked,  but  retain  the  previous  view  (plotted  in  a  different  colour) 
so  that  the  effect  of  the  new  tool  can  be  assessed  visually.  In  addition  to  the  signal 
analysis  tools,  IDEA  contains  an  in-built  signal  generation  capability,  allowing  the 
user  to  generate  a  range  of  synthetic  time  interleaved  signals  of  varying  complexity. 

3.2.  idea’s  Tree  Structure 

The  main  processing  window  of  IDEA  is  shown  in  Fig.  2.  In  this  example  a  data 
file  has  been  loaded  and  a  Deinterleaving  tool  employed  to  process  the  PDWs  created 
for  the  signals  that  make  up  the  data  record.  The  hierarchical  tree  structure  shown 
in  Fig.  2  has  been  built  up  through  analysis  and  represents  a  unique  and  powerful 
approach  to  deinterleaving  processing  that  is  one  of  IDEA’S  main  strengths.  Each 
node  of  the  tree  is  labelled  according  to  the  analysis  technique  that  was  used  to 
create  it,  including  input  parameters  that  are  employed  by  an  analysis  tool.  Once 
a  SOI  or  portion  of  a  SOI  has  been  isolated  through  processing  and  a  node  created 
for  it,  the  pulses  of  that  node  may  be  operated  on  further  by  a  number  of  other 
tools;  the  intent  being  to  reveal  additional  pareimetric  information  on  the  signal. 
Nodes  may  also  be  grouped  together  within  the  IDEA  tree  structure  and  this  is  a 
particularly  useful  facility  when  looking  for  evidence  of  signal  fragmentation. 

3.3.  Processing  Tools 

3.3.1.  Data  Viewers 

The  data  View  tools  make  up  the  first  set  of  tools  that  are  likely  to  be  used 
by  an  ELINT  analyst  and  include:  As  Text  viewers,  TOA  Raster  plots,  parameter 
Histograms,  Parameter  vs.  Time  plots,  and  Parameter  vs.  Parameter  viewers.  The 
Histograms  may  be  used  to  help  isolate  a  SOI  by  setting  up  a  Parametric  Filter 
within  IDEA,  which  is  essentially  a  parameter-based  band-pass  filter.  The  Param¬ 
eter  vs.  Parameter  viewers  on  the  other  hand,  could  be  used  to  cue  a  cluster-based 
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PIG.  3.  AOA  vs.  RF  data  viewer. 


deinterleaving  tool.  In  Fig.  3  we  show  an  AOA  vs.  RF  plot  which  clearly  illus¬ 
trates  the  presence  of  three  radar  sources  and  suggests  that  the  Clnster  Analysis 
tool,  also  within  IDEA,  could  readily  deinterleave  the  three  signals.  The  results 
of  cluster  analysis  for  this  particular  data  set  are  shown  in  the  tree  structure  of 
IDEA’S  main  processing  window  in  Fig.  2,  where  the  nth  feature  vector,  ®(n),  for 
cluster  processing  has  been  defined  in  terms  of  the  RF  and  AOA  measurements  via 

x{n)  —  [rf(n)  aoa(n)]'.  (1) 


3.3.2.  Signal  Detectors 

The  signal  Defection  tools  provide  an  alternative  means  of  recognising  the  pres¬ 
ence  of  a  particular  type  of  signal  embedded  in  an  interleaved  data  record  and 
include  TDOA  histograms  of  various  forms.  The  TDOA  histograms  may  be  viewed 
as  autocorrelation  devices  and  reveal  information  on  the  PRIs  of  interleaved  signals, 
e.g.,  see  [9].  This  information  could  then  be  used  to  prime  a  pulse  TOA-based  se¬ 
quence  search  algorithm.  The  Primed  Sequence  Search  tool  within  IDEA  is  another 
method  of  isolating  a  SOI  and  searches  for  repetitive  patterns  in  the  arrival  time 
sequence  recorded  for  a  signal  [6]. 

3.3.3.  Deinterleaving  Tools 

The  Deinferleaving  tools  supported  by  IDEA  include  Parametric  Filters  and  a 
Primed  Sequence  Search  algorithm  as  previously  stated.  The  tools  also  include  an 
Auto  Sequence  Search  technique  [7],  the  essence  of  which  is  arguably  the  workhorse 
deinterleaving  method  for  a  large  number  of  fielded  radar  intercept  receivers. 

Cluster  Analysis  makes  up  the  final  Deinferleaving  tool.  In  the  context  of  EEINT 
processing  and  extraction,  cluster  analysis  seeks  to  assign  a  sequence  of  N  feature 
vectors  {®(0),  ®{1),  . . . ,  x{N  - 1)}  derived  from  a  buffer  of  N  radar  pulses,  where 
®(n)  e  to  a  finite  number  of  source  classes  by  searching  for  structure  in  a 
multi-dimensional  feature  space.  IDEA  contains  an  nnsnpervised  neural  network 
algorithm  for  cluster  analysis  that  is  also  self-evolving  in  that  the  network  adapts 


IDEA  —  A  SIGNALS  ANALYSIS  PACKAGE  FOR  BLINT  PROCESSING 


5 


FIG.  4.  The  IDEA  cluster  analysis  diadogue  box. 


itself  to  the  number  of  radar  emitters  or  emitter  modes  detected  in  a  data  buffer. 
An  early  version  of  the  method  has  been  reported  in  [3]  and  the  dialogue  box  for 
IDEA’S  current  implementation  of  the  algorithm  is  shown  in  Fig.  4. 

Despite  the  strength  of  cluster  analysis  as  a  deinterleaving  tool,  it  can  be  vulnera¬ 
ble  to  emitter  agility.  Consider,  for  example,  a  radar  source  that  hops  between  three 
different  RFs  while  its  other  parameters  remain  constant.  A  cluster-based  deinter¬ 
leaving  technique  might  report  the  intercepted  signal  as  three  separate  emitters. 
One  possible  solution  to  help  remedy  this  situation  is  to  re-group  those  pulses  from 
the  three  clusters  as  a  trial  merge  within  IDEA  and  compute  a  TDOA  histogram 
for  the  pooled  pulses.  The  resulting  TDOA  histogram  could  then  be  compared 
with  the  TDOA  histogram  computed  for  each  of  the  individual  clusters  prior  to 
merging  the  data.  If  the  TDOA  histogram  for  the  pooled  pulses  is  more  “ordered” 
compared  to  that  obtained  for  each  of  the  non-pooled  pulse  TDOA  histograms,  the 
merge  should  be  allowed  and  the  signiils  re-associated.  This  type  of  processing  is 
essentially  a  visual  implementation  of  the  entropy-beised  scheme  outlined  in  [1]. 

3.3.4-  Signal  Analysers 

The  IDEA  Analysis  tools  operate  on  a  SOI  once  it  has  been  isolated  from  other 
signals.  The  tools  currently  include  a  histogram-based  method  for  revealing  the 
clock  period  that  is  used  to  generate  the  PRI  sequence  of  a  discrete  jittered  signal  [9] 
and  recursive  estimators  for  the  mean  PRI  and  TO  A  phase  parameters  of  a  signal. 
To  illustrate  the  latter,  consider  the  following  model-based  approach  to  pulse  TOA 
analysis  in  which  the  nth  pulse  TOA,  for  a  signal  is  modelled  as 

in  =  t,l>  +  nT  +  an,  n  =  0,  1,  . . . ,  iV  -  1,  (2) 

where  and  T  are  constant  parameters  denoting  the  TOA  phase  and  mean  PRI 
respectively,  and  an  is  a  Gaussian  jitter  random  variable  with  distribution  A/”(0,tT^). 
Least  squares  estimators  for  and  T  have  been  reported  in  [4]  and  implemented 
in  IDEA.  Results  for  the  least  squares  estimator  of  T  are  shown  in  Fig.  5. 

4.  CONCLUDING  REMARKS 

The  interactive  nature  of  IDEA  emphasises  the  important  role  that  a  human  op¬ 
erator  can  play  in  the  laboratory-based  analysis  of  ELINT  data.  It  also  highlights 
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FIG.  6.  Recursive  leas!  squares  eslimalion  of  {he  mean  PRI,  T,  of  a  signal. 

the  types  of  skills  that  can  be  brought  to  bear  by  an  experienced  analyst,  par¬ 
ticularly  as  they  relate  to  visual  pattern  recognition  and  a  synergistic  approach  to 
EtINT  problem  solving.  To  translate  these  skills  into  an  autonomous  system  where 
signal  intercept  and  analysis  is  carried  out  automatically,  e.g.,  see  [8],  represents 
one  of  the  major  technical  challenges  facing  the  ELINT  community  today. 
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Wigner-Ville  analysis  oF  HF  radar  measarements  of  a 
surrogate  theatre  ballistic  missile 
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The  paper  describes  results  obtained  by  applying  the  Wigner-Ville  dis¬ 
tribution  to  high  frequency  liue-of-sight  radar  measurements  of  a  surrogate 
theatre  ballistic  missile  during  powered  flight.  The  Wigner-Ville  distribu¬ 
tion  is  able  to  discriminate  between  target  and  interfering  transient  events 
and  to  accurately  determine  the  instantaneous  Doppler  law  of  the  target. 
It  is  a  useful  pre-processing  step  for  determining  the  instantaneous  received 
signal  level  and  the  coherent  processing  loss  due  to  target  acceleration. 


'Key  Wvrds:  time-frequency,  high  frequency  radar,  theatre  ballistic  missile 


1.  INTRODUCTION 

In  this  paper  we  describe  the  results  obtained  by  applying  the  Wigner-Ville  distri¬ 
bution  (WVD)  to  measurements  obtained  from  a  high  frequency  (HF)  line-of-sight 
radar.  The  measurements  were  of  an  accelerating  surrogate  theatre  ballistic  missile 
(TBM)  target  during  powered  flight  immediately  following  launch.  We  have  shown 
that  the  WVD  is  able  to  discriminate  between  the  accelerating  target  and  a  inter¬ 
fering  transient  event  which  produces  a  similar  Doppler  spectrum  to  the  target  and 
hence  a  potentially  confusing  display  at  the  radar  output.  The  WVD  is  also  able 
to  accurately  determine  the  instantaneous  Doppler  law  of  the  target,  at  a  temporal 
resolution  comparable  with  the  radar  sweep  (or  pulse)  duration.  Knowledge  of  the 
instantaneous  Doppler  law  is  applied  to  the  problem  of  determining  the  coherent 
processing  loss  due  to  target  acceleration,  and  to  determining  the  instantaneous 
received  signal  level.  For  an  example  of  time  frequency  distributions  applied  to  HF 
skywave  data  (not  of  accelerating  targets)  see  [1]. 

The  paper  is  set  out  as  follows.  We  provide  background  to  the  data  used  in 
section  2.  Next  we  provide  a  signal  model  and  justify  the  selection  of  the  WVD 
as  the  one  of  many  potential  time-frequency  distributions  which  may  have  been 
applied.  We  present  our  results  in  section  4  and  provide  our  conclusions  in  section  5. 

2.  BACKGROUND 

During  September  1997  four  surrogate  TBMs  were  launched  from  a  site  in  north¬ 
ern  West  Australia  to  test  a  variety  of  TBM  launch  detection  sensors.  One  of  the 
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sensor  suite  was  a  bistatic  HF  line-of-sight  radar.  The  radar  was  positioned  tens  oF 
kilometres  from  the  launch  site  and  it  operated  at  a  carrier  Frequency  oF  approx¬ 
imately  25MHz.  The  radar  waveForm  was  linear  Frequency  modulated  continuous 
wave  (EFMCW)  and  the  repetition  Frequency  (WRF)  was  50Hz.  A  set  oF  coherent 
measurements  were  collected,  each  oF  5s  duration  and  there  was  a  short  inter-dwell 
gap  between  each  coherent  measurement  interval.  The  choice  oF  waveForm  and  low 
WRF  meant  that  Doppler  measurements  oF  the  high  velocity  target  were  ambiguous 
For  most  oF  the  flight.  The  long  coherent  integration  time  (CIT)  increased  radar 
sensitivity  although  the  target  acceleration  decreased  the  coherent  processing  gain 
achieved  and  limited  the  accuracy  oF  velocity  measurements. 


3.  SELECTION  OF  THE  WIGNER-VILLE  DISTRIBUTION 

The  data  consisted  oF  complex  time  series  which  correspond  to  sequences  oF  co¬ 
herent  measurements  From  the  particular  azimuth  range  cells  which  contained  the 
target.  Preliminary  analysis  suggested  that  the  Following  signal  model  would  be 
appropriate. 

z{t)  =  -t-  c{t)  -I-  n{t)  (1) 


For  {t  ;  0  <  t  <  T}  where  A  is  the  complex  amplitude,  Jo,^  are  the  linear  FM 
parameters,  c(t)  represents  clutter,  n(f)  represents  noise  and  T  is  the  coherent 
integration  time  (CIT)  oF  the  radar.  In  the  case  oF  a  bistatic  HF  line  oF  sight  radar, 
c{t)  includes  contributions  such  as;  the  direct  signal  From  the  transmitter,  range 
sidelobes  From  the  direct  signal,  additional  targets,  say  from  a  booster  stage  in  a 
multi-stage  rocket  and  meteor  and  ionospheric  scatter.  In  general,  both  the  clutter 
and  noise  are  unknown,  although  we  assume  the  relative  energy  is  such  that 


_ |ApT _ 

[/J"|c(t)|2dt  +  /J’Kt)|2dt] 


^  1 


(2) 


and  thereby  consider  z{t)  as  deterministic  with  unknown  parameters,  unknown 
clutter  and  background  noise,  and  high  signal  to  clutter  plus  noise  energy  ratio. 
The  objective  is  to  determine  7o  and  ]3  as  part  oF  the  task  oF  establishing  the 
accelerating  target  dynamics. 

Next  we  discuss  the  suitability  oF  using  the  Wigner-Ville  distribution  (WVD)  to 
determine  Jo  and  jS.  We  consider  the  case  oF  continuous  time  signals  For  convenience 
in  the  subsequent  derivations,  although  the  results  extend  to  discrete  time  sequences 
in  a  straightForward  manner.  Complete  expositions  on  the  WVD  and  instantaneous 
frequency  estimation  are  given  in  [2-4]  and  the  reFerences  therein. 


3.1.  Definition  of  the  Wigner-Ville  distribution 

The  WVD  of  a  deterministic  signal  z(t)  is  defined  as 

/OO  —  — 

here  *  denotes  conjugation.  x(t)  is  a  continuous  time  analytic  signal,  and 

z{t)  =  x{t)  +  jH[ar(t)] 


(3) 

(?) 
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for  a  real  signal  x{t)  and  the  Hilbert  transform  operator  H.  Let  1Z{J)  be  the  Fourier 
transform  of  z(t)  and  X(7)  the  Fourier  transform  of  x{t).  For  z{t)  to  be  analytic 
then 


r  2Z(7)  7>o 

m)  =  {  -xij)  7=0  (5) 

I  0  7<o 

The  requirement  that  z{t)  in  (3)  be  analytic  is  motivated  by: 

1.  Recognition  that  the  quadratic  form  in  (3)  introduces  interaction  features,  or 
cross  terms,  between  individual  additive  components  of  z{t).  Enforcing  that  z{t) 
be  analytic  at  least  eliminates  cross  terms  generated  by  interaction  between  the 
symmetric  positive  and  negative  frequency  components  of  a  real  signal 

2.  Additional  computational  savings  realisable  when  restricting  the  domain  of 
W(t,  7)  to  the  half  plane  7  >  0. 

In  many  radar  applications  z{t)  is  the  output  of  quadrature  demodulation  in  the 
radar  receiver.  The  signal  is  complex  but  not  necessarily  zero  for  7  <  0.  In  this  case 
z{t)  can  be  used  directly  in  (3)  without  enforcing  (5)  provided  W(t,  J)  is  evaluated 
over  the  full  t,  J  plane. 

3.2.  WVD  of  a  single  linear  FM  signal 
Consider  the  WVD  for  a  single  component  linear  FM  signal.  Let 

z{t)  =  ^e^2w[7„i+|t^]  Pqj.  ^  g  (— cSDjOO)  (6) 

with  complex  amplitude  A,  initial  frequency  7o  and  chirp  rate  ]3,  then  it  can  be 
shown  that 


W{tJ)  =  \Af-6i7-[7o  +  m)  (7) 

The  WVD  of  a  single  component  complex  linear  FM  signal  shows  ideal  concentra¬ 
tion  in  the  time  frequency  plane  along  the  instantaneous  frequency  (IF)  law  of  the 
signal.  The  instantaneous  frequency  law,  7i{t)  =  7o  +  can  be  determined  by 

7j(f)  =  ^sp[W(t,7)]  vt  (8) 

For  signals  which  are  not  purely  deterministic,  such  as  that  in  (1),  the  instantaneous 
frequency  law  may  be  estimated  using  (8)  to  give  an  IF  estimate  7j(^)- 

3.3.  WVD  of  multi-component  signals 

The  WVD  can  become  uninterpretable  when  there  are  more  than  a  modest  num¬ 
ber  of  component  signals  with  comparable  energy.  Consider  the  two  additive  com¬ 
ponent  signal 

z(t)  =  Ap(t)  +  Bq(t)  (9) 

with  complex  scalars  A  and  B  and  the  component  energy  constraint 

/OO  rOO 

\p{t)\^dt^  i9(t)|2dt  (10) 

■OO  J  — OO 
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The  WVD  of  z{t)  may  be  written  in  terms  of  the  signal  components  p(t)  and  q{t) 

W,{tJ)  =  \A\^Wp{tJ)  +  A-B*Wpg{tJ)  (11) 

+BA*W,p{tJ)  +  \-B\^W,itJ) 

The  WVD  of  the  snm  of  the  two  components  p(t)  and  q{t)  is  the  sum  of  the 
individual  WVD  of  each  component  Wp  and  W,  (auto-terms)  plus  the  cross  WVD 
between  components  Wp,  and  Wgp.  The  individual  auto  and  cross  WVD  are  scaled 
by  the  appropriate  products  of  the  scalars  A  and  B. 

Given  the  constraint  (10),  and  assuming  now  that  one  is  primarily  interested 
in  determining  the  IF  law  Jf  (*)  of  component  p(t),  the  relative  inlerpretability  of 
W^  (t,  7)  will  be  determined  by  the  scalars  A  and  35.  H  A  ^  B  then 

w,(t,7)«|>ipWp(t,7)  (12) 

and  Jfit)  will  be  well  estimated  by  (8).  For  .4  B  all  terms  in  (11)  will  contribute 
to  Wz  making  interpretation  and  estimation  based  on  Wx  difficult.  With  A  C  B 

1^1"  Wp(t,  7)  ^  ab*  Wp,(f,  7)  +  BA*  W,p(t,  7)  (13) 

+|B|2w,(i,7) 

and  W2  will  be  of  little  use  in  determining  J?  (t). 

3.4.  WVD  of  transient  signals 

Consider  a  single  component  signal  which  is  non-zero  for  only  a  small  fraction  of 
the  radar  CIT.  Let  s(t)  be  an  arbitrary  finite  energy  signal  with  y  ^  T  and  define 
z{t)  as 


z{t)  = 


{ 


s{t) 

0 


tc-f<t<tc+f 

otherwise 


(14) 


with  Y  <  ic  <  B  -  y.  Spectrum  analysis  procedures  which  assume  stationarity 
which  are  applied  to  z{t)  will  not  capture  the  transient  nature  of  this  signal.  Any 
stationary  spectrum  estimate  will  appear  broadband,  of  the  order  ^  Hz  bandwidth 
or  greater,  depending  on  s{t),  and  may  be  confused  with  the  spectrum  of  signals 
such  as  that  in  (1). 

The  WVD  has  time  and  frequency  marginal  properties  which  ensure  that  the 
time  and  frequency  support  of  z{t)  are  captured  in  the  distribution.  The  time 
marginal  property  \z{t)f  —  Wz(t,J)dJ  indicates  that  for  z(t)  in  (14)  Wz  i=-  0 
only  in  the  interval  tc-w  <t  <tc  +  ^-  Hence,  Wz  captures  the  transient  nature 
of2(t). 


4.  RESULTS 

We  have  applied  the  WVD  in  the  following  ways;  to  assist  with  discriminating 
between  accelerating  and  transient  targets  or  scalterers,  to  determine  the  target 
instantaneous  Doppler  law,  and  to  determine  the  instantaneous  received  target 
energy  law  and  the  processing  loss  caused  by  the  target  acceleration. 
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4.1.  Accelerating  target  v.  transient 
Figures  1  and  2  (left)  show  the  range-Doppler  (RD)  maps  generated  For  two 
separate  beam  steer  directions  measured  during  the  same  CIT.  In  figure  1  the 
accelerating  target  is  visible  as  a  large  smear  in  Doppler  at  range  cell  7.  In  figure  2 
(left)  the  transient  meteor  scatterer  is  visible  at  range  cell  19,  which  is  also  smeared 
in  the  Doppler  domain.  We  seek  improved  discrimination  between  the  accelerating 
and  the  transient  scallerers.  Figures  2  (right)  and  3  (left)  show  the  WVD  computed 
From  the  time  series  corresponding  to  the  mentioned  range  cells.  The  instantaneous 
Doppler  law  oF  the  accelerating  target  is  visible  and  so  is  the  transient  behaviour  oF 
the  meteor  scatterer.  The  WVD  is  able  to  discriminate  clearly  between  these  two 
types  oF  scalterers. 


4.2.  Instantaneons  Doppler  law 

The  instantaneous  Doppler  law  has  been  extracted  from  the  peak  oF  the  WVD 
shown  in  figure  2  (right)  according  to  (8).  CITs  prior  to  and  later  than  this  mea¬ 
surement  interval  have  also  been  analysed  using  the  WVD,  with  the  instantaneous 
Doppler  law  again  estimated  from  the  peak  according  to  (8).  Appropriate  smooth¬ 
ing  oF  the  estimates  using  polynomial  models  reduces  estimate  variance,  allows 
Doppler  law  prediction  into  the  inter-dwell  intervals  and  can  be  easily  integrated 
to  produce  phase  law  estimates.  The  sequence  oF  unsraoothed  Doppler  estimates 
shown  in  figure  3  (right)  covers  an  interval  oF  approximately  10  CITs.  Clearly, 
it  is  possible  to  determine  accurate  instantaneous  Doppler  sequences,  with  tem¬ 
poral  resolution  oF  approximately  the  sweep  duration,  as  compared  with  the  CIT 
For  conventional  Doppler  processing  (20ms  v.  5.12s).  Note  that  the  target  second 
stage  motor  ignited  at  approximately  18s  and  that  accurate  time  oF  ignition  can  be 
determined  from  the  instantaneous  Doppler  law. 

4.3.  Instantaneous  energy  law 

Knowledge  oF  the  instantaneous  Doppler  law  can  also  be  used  to  construct  a 
demodulation  reFerence  signal,  s(i).  This  signal  has  unity  amplitude  and  instanta¬ 
neous  frequency  law  which  is  the  conjugate  oF  the  estimated  instantaneous  Doppler 
law  oF  the  target,  i.e.  the  instantaneous  frequency  law  oF  z{t).  s{t)  can  be  used  to 
demodulate  z{t)  giving  the  approximately  stationary  time  series  z'{t). 

z'{t)  =  z(t)  ■  s{t)  (15) 

The  instEintaneous  energy  oF  the  demodulated  time  series  is 

fit)  =  G\zW  (16) 

where  G  is  some  smoothing  operator.  The  instantaneous  energy  is  shown  in  figure  4 
(left),  which  shows  three  different  levels  oF  local  smoothing,  (i.e.  different  G). 

4.4.  Processing  loss  due  to  target  acceleration 

The  processing  loss  due  to  target  acceleration  compared  with  a  comparable  target 
oF  constant  velocity  can  be  determined.  One  contrasts  standard  Doppler  processing 
applied  to  the  time  series  z{t)  and  to  the  demodulated  version  z'it).  It  can  be 
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ARD:  L4L-ART-U.RDO:  199725420:15:46.06:  F^=24542.0KHz:  BW=25.0KHz 


Doppler  (Hz):  WRF=50Hz 


FIG.  1.  Range-Doppler  map  showing  the  accelerating  target  smeared  in  Doppler  in  range 
cells  6  and  7.  The  direct  wave  from  the  transmitter  and  ground  clutter  is  visible  surrounding  OHz 
Doppler  and  centered  in  range  cell  2.  The  coasting  spent  first  stage  of  the  two  stage  TBM  can  be 
seen  at  range  cell  6  with  lOHz  Doppler. 


seen  from  fignre  4  (right)  that  the  processing  loss  is  approximately  lOdB  for  this 
particular  CIT. 


5.  CONCLUSIONS 

The  Wigner-Ville  distribution  has  been  applied  to  HF  line-of-sight  radar  measure¬ 
ments  of  a  TBM  launch.  This  approach  has  assisted  with  discriminating  between 
the  accelerating  target  and  transient  meteor  scalterers.  It  has  provided  estimates 
of  the  Doppler  law  of  the  target  at  a  temporal  resolution  of  approximately  20ms 
compared  with  standard  processing  which  had  a  temporal  resolution  of  5.12s.  It 
has  also  allowed  determination  of  the  instantaneous  energy  law  of  the  target  and 
has  provided  an  estimate  of  the  processing  loss  when  standard  Doppler  processing 
is  applied  to  a  rapidly  accelerating  target. 
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PIG.  2.  (Left)  Range-Doppler  map  showing  the  transient  meteor  scatterer  in  range  celi  19. 
It  is  difficult  to  discriminate  between  this  smeared  feature  and  the  smeared  accelerating  target 
shown  in  the  previous  figure.  (Right)  WVD  of  the  time  series  containing  the  accelerating  target 
from  range  cell  7  in  figure  1.  Second  stage  ignition  occurred  at  18s. 
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FIG.  3.  (Left)  WVD  of  the  time  series  containing  the  transient  meteor  scatterer  from  range 
cell  19  in  figure  2  (left).  (Right)  Doppler  v.  elapsed  time.  Computed  using  the  WVD.  The  gaps 
are  due  to  missing  sweeps  during  the  raidar  interdwell  gap  and  some  lost  instantaneous  Doppler 
detections  at  either  end  of  individual  radar  CIT.  The  plot  is  a  sequence  of  point  measurements, 
one  per  radar  sweep,  and  not  a  continuous  line. 
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FIG.  4.  (Left)  Instantaneous  energy  law  v.  elapsed  time  for  the  interval  16.5s  to  21.5s 
after  launch.  The  three  curves  correspond  to  1  (..),  11  (-  -)  and  19  (-)  sample  zero  phase  moving 
average  smoothing.  (Right)  The  Doppler  spectrum  computed  over  the  full  CIT  for  the  original 
time  series  (-.)  and  for  the  demodulated  time  series  (-).  The  processing  loss  caused  by  assuming 
a  constant  velocity  target  is  approximately  lOdB  in  this  case. 
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This  paper  is  concerned  with  colored  noise  matched  filtering  when  the 
noise  covariance  is  unknown  a  priori.  Both  principal  components  analysis 
and  canonical  correlation  analysis  require  knowledge  of  the  noise  covari¬ 
ance  matrix  and  its  inverse.  Furthermore,  neither  of  these  methods  yield 
an  optimcd  representation  of  the  noise  subspace  for  rcink  one  detection 
problems  even  when  the  covariance  is  known.  The  multistage  Wiener  filter 
is  shown  to  be  optimal  when  the  noise  covarieuice  is  unknown  in  the  sense 
that  it  determines  the  best  noise  subspace  representation  and  reduced-rank 
approximation  as  a  function  of  rank. 


Key  Words:  rank  reduction,  signal  representation,  signed  compression. 


1.  INTRODUCTION 

Classical  detection  problems  in  radar,  sonar  and  communications  determine  the 
presence  or  absence  of  a  target  signal  observed  in  noise.  It  is  common  to  assume 
that  all  signals  are  independent  Gaussian  random  processes  as  a  starting  point  in 
the  analysis.  Under  these  conditions,  the  solution  is  found  by  analyzing  the  target 
(or  desired)  signal  and  the  noise  statistics.  The  target  analysis  is  achieved  through 
the  use  of  prior  knowledge  under  the  hypothesis  that  the  target  is  present.  This  step 
may  utilize  a  steering  vector,  a  matched  field  processor  or  correlation  information 
such  as  a  CDMA  code.  The  noise  covariance  can  not  be  assumed  to  be  known 
in  practical  problems,  and  the  noise  statistics  must  be  estimated.  This  paper  is 
concerned  primarily  with  the  estimation  of  the  noise  statistics  and  the  impact  that 
it  has  on  the  detection  problem. 

Define  an  A-dimensional  signal  vector  s,  which  can  be  considered  to  be  a  radar 
steering  vector,  the  output  from  a  matched  field  processing  routine  for  sonar  or  a 
correlation  vector  used  in  communications.  Let  the  A-dimensional  test  vector  x  be 
the  observed  vector  being  considered  for  target  presence  or  absence.  If  the  N  x  N 
noise  covariance  matrix  R  is  known  a  priori,  then  a  popular  constant  false  alarm 
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rate  (CFAR)  test  is  given  by 


A  = 


Is^R-'xp 


(1) 


where  Hi  denotes  the  target  present  hypothesis,  Hq  is  the  null  hypothesis  and  rj  is 
a  threshold  determined  by  the  false  alarm  rate  [1-3] . 

The  numerator  of  the  test  statistic  in  (1)  is  the  output  power  of  a  colored  noise 
matched  filter  v^z,  where  the  whitened  signal  vector  is  v  =  R“^/^s  and  the 
whitened  data  vector  is  z  =  R~^/^x.  The  test  then  compares  the  ratio  of  the  col¬ 
ored  noise  matched  filter’s  output  power  and  the  output  noise  power  to  a  threshold. 
The  expected  value  of  the  numerator  and  the  denominator  are  identical  when  a  tar¬ 
get  is  absent  in  the  known  covariance  case,  and  the  mean  value  of  the  logarithm 
of  the  left-hand  side  of  (1)  is  0  dB.  The  mean  value  of  this  test  statistic  increases 
when  the  target  is  present  as  a  function  of  the  signal-to-interference  plus  noise  ratio 
(SINK).  Note  that  the  colored  noise  matched  filter  output  is  identical  to  the  output 
of  a  Wiener  filter  g  =  R"^s:  y  =  v^z  =  g^x. 

The  estimation  of  the  noise  covariance  is  considered  in  Sect.  2,  along  with  dis¬ 
cussions  pertaining  to  sample  support,  complexity  and  rank  reduction.  An  analysis 
of  the  colored  noise  matched  filter  is  then  studied  in  Sect.  3,  where  the  multistage 
Wiener  filter  is  derived  from  an  optimization  of  the  noise  subspace  representation 
as  a  function  of  rank.  Simulation  results  are  also  depicted  in  Sect.  3  to  demonstrate 
algorithm  performance.  The  summary  is  presented  in  Sect.  4. 


2.  NOISE  COVARIANCE  ESTIMATION 

A  maximum  likelihood  estimate  is  often  used  to  replace  the  covariance  matrix 
when  the  true  covariance  matrix  is  not  known.  Replacing  the  matrix  inversion  in 
(1)  with  this  estimate  results  in  a  normalized  form  of  the  celebrated  sample  matrix 
inversion  (SMI)  algorithm  [4]. 

Many  attributes  of  the  SMI  algorithm  may  not  be  obvious  to  the  casual  observer. 
For  example,  the  number  of  samples  required  for  the  statistics  to  converge  in  an 
SINR  sense  is  at  least  2JV  independent  and  identically  distributed  {iidj  snapshots. 
Also  the  computational  complexity  grows  as  O(iV^)  due  to  the  matrix  inversion  re¬ 
quirement  in  the  CFAR  test,  where  0(  )  denotes  the  highest  order  of  the  underlying 
polynomial  and  at  least  2N  samples  are  used  in  the  covariance  estimation. 

The  detection  of  small  signals  in  competing  noise  requires  that  the  signal  space 
be  enlarged  to  obtain  a  subspace  where  the  signal  and  noise  can  be  separated  with 
sufficient  resolution.  This  fact  has  led  to  adaptive  processing  in  two  and  more 
dimensions  using,  for  example,  spatial,  temporal,  Doppler  and  polarization  degrees 
of  freedom.  An  impact  of  increasing  both  the  number  of  dimensions  in  the  signal 
space  (for  signal  discrimination)  and  increasing  the  bandwidth  of  each  dimension 
in  that  space  (for  resolution)  is  that  the  parameter  N  quickly  becomes  very  large. 

The  critical  problem  which  occurs  as  N  increases  is  satisfying  the  requirement  for 
at  least  2N  iid  samples  for  covariance  estimation.  This  is  particularly  difficult  in 
the  radar  and  sonar  problems,  where  the  samples  are  taken  from  disparate  spatial 
locations.  In  other  words,  the  training  region  for  estimating  the  noise  statistics  is 
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in  the  range  domain.  Thus  the  sensor  must  estimate  the  noise  statistics  in  the  test 
cell  using  data  which  is  increasingly  distant  as  the  parameter  N  increases. 

Rank  reduction  is  a  method  to  potentially  reduce  the  sample  support  and  com¬ 
putational  complexity  requirements  of  these  large  dimensional  problems.  The  goal 
here  is  to  find  a  low  rank  subspace  that  can  accurately  represent  the  noise  present 
in  the  test  cell.  If  the  rank  is  reduced  from  N  to  M  <  N,  then  the  sample  support 
requirement  and  the  computational  complexity  can  both  be  reduced  accordingly. 
Some  approaches  have  been  proposed  to  solve  this  problem  from  a  statistical  frame¬ 
work,  however  they  really  presuppose  the  estimation  of  the  noise  covariance  matrix. 

The  principal  components  inverse  (PCI)  and  eigencanceler  algorithms  utilize  a 
low  rank  estimate  formed  by  those  eigenvectors  of  the  SMI  covariance  matrix  which 
correspond  with  the  largest  eigenvalues  [5-7].  The  cross-spectral  metric  (CSM)  uses 
a  low  rank  estimate  composed  of  those  eigenvectors  of  the  SMI  covariance  matrix 
which  actually  maximize  the  SINR  (this  solution  has  been  shown  to  be  generally 
different  than  that  obtained  by  PCI  and  the  eigencanceler)  [8-11].  One  key  attribute 
of  the  CSM  method  is  that  it  solves  the  optimal  signal  representation  problem  (for 
the  noise  process)  with  an  eigenvector  basis.  Therefore  optimal  compression  of  the 
noise  subspace  is  obtained  by  the  CSM  relative  to  an  eigenvector  basis.  The  CSM 
achieves  this  capability  due  to  its  target  signal  dependency,  a  feature  lacking  in  the 
PCI  and  eigencanceler  algorithms. 

Unfortunately  the  damage  has  already  been  done  in  these  cases  since  the  averag¬ 
ing  to  obtain  the  original  covariance  matrix  (or  an  eigenvector  basis  for  the  space 
spanned  by  its  columns)  needs  to  be  accomplished  first.  If  N  is  reasonably  large, 
then  it  quickly  becomes  unreasonable  to  assume  that  there  is  a  region  of  space  large 
enough  to  support  the  2N  or  more  samples  which  are  iid  with  the  noise  present 
in  the  test  cell.  That  is,  the  clutter  and  interference  are  not  capable  of  retaining 
the  properties  of  stationarity  and  homogeneity  over  extended  regions.  Note  that 
computing  the  eigendecomposition  is  equivalent  to  computing  the  inverse  from  an 
information  perspective,  and  that  computing  the  inverse  is  equivalent  to  having 
solved  the  full-rank  problem.  Thus,  the  PCI,  eigencanceler  and  CSM  methods 
can  all  be  considered  as  attempts  at  reducing  the  rank  to  improve  performance 
after  having  to  compute  the  original  full-rank  solution.  While  it  is  true  that  the 
“larger”  eigenvectors  may  be  estimated  with  lower  sample  support,  they  generally 
span  a  suboptimal  subspace  and  cannot  compensate  for  problems  such  as  a  lack  of 
homogeneity. 

Finally,  it  is  mentioned  that  canonical  correlation  analysis  (CCA)  [12-14]  is  not  of 
use  in  classical  rank  one  signal  detection  problems.  While  one  could  use  canonical 
coordinates  with  the  SMI  covariance  matrix,  the  solution  degenerates  to  the  direct 
solution  of  the  full-rank  Wiener  filter.  CCA  therefore  falls  into  the  category  of 
requiring  the  full-rank  solution  while  not  even  motivating  rank  reduction  for  this 
problem. 


3.  COLORED  NOISE  MATCHED  FILTERING 

What  is  really  desired  is  a  method  to  obtain  the  normalized  colored  noise  matched 
filter  without  knowledge  of  either  the  true  or  SMI  noise  covariance  matrices.  In  ad¬ 
dition,  it  is  desireable  to  obtain  this  result  while  simultaneously  only  using  those 
training  samples  which  are  most  correlated  with  the  noise  in  the  test  cell.  Therefore 
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it  is  necessary  to  obtain  a  low  rank  subspace  which  is  optimized  in  the  sense  that 
it  provides  the  optimal  representation  of  the  noise  in  the  test  cell  using  the  noise  in 
the  training  data.  Note  that  this  is  an  information  theoretic  solution  which  must 
be  solved  before  the  full-rank  problem  is  formulated.  Now  consider  explicitly  solv¬ 
ing  this  optimal  representation  problem  without  knowledge  of  the  noise  covariance 
matrix,  its  inverse  or  its  eigenstructure. 

The  first  step  in  the  optimization  process  is  to  recognize  that  the  white  noise 
matched  filter,  s  (assumed  for  convenience  to  be  a  unit  vector),  is  the  best  solution 
without  prior  knowledge  of  the  noise  covariance.  All  of  the  remaining  information 
(and  coloring)  that  is  not  accounted  for  in  the  white  noise  matched  filter  “lives”  in 
its  orthogonal  complement  s-*-.  Define  the  first  basis  vector  to  be  s. 

Consider  the  implementation  of  a  full-rank  Wiener  filter  in  s'*"  to  estimate  the 
colored  noise  projected  into  the  rank  one  subspace  s.  This  yields  a  partitioned- 
form  processor  called  a  generalized  sidelobe  canceler  [11],  and  is  well  known  to 
be  equivalent  with  the  original  colored  noise  matched  filter  for  the  target  signal 
with  known  noise  covariance.  While  this  result  is  useful,  it  cannot  be  the  solution 
currently  sought  since  it  requires  knowledge  of  the  noise  covariance. 

Another  approach  is  to  pick  the  optimal  rank  one  subspace  in  s-*-  for  estimating 
the  colored  noise  that  “leaked”  through  the  filter  s,  without  knowledge  of  the  noise 
covariance.  This  solution  is  easily  verified  to  be  the  white  noise  matched  filter  r^odo  > 
which  is  the  cross-correlation  vector  between  the  first  stage  white  noise  matched 
filter  output  do  =  s^x  and  the  data  vector  xq  =  Bx  in  s-'-  (where  B  is  the  (N  — 
1)  X  N  projection  matrix  into  s-*-,  termed  a  blocking  matrix).  Define  the  next  basis 
generating  vector  eis  the  unit  vector  hi  =  rj:„(io/j(riodo||.  Also  define  the  scalar 
Wiener  filter  gi  as  the  optimal  linear  filter  for  estimating  do  from  di  =  .  The 

first  stage  subspace  estimation  error  is  then  given  by  ei  =  do  ~  5idi. 

Next  a  recursion  is  developed  which  optimizes  the  remaining  basis  selection  for 
i  =  1,2, . . N  —  1.  Define  B,  as  the  projection  operator  into  h/-,  which  yields  the 
data  vector  x,-  =  B,x,_i.  Then  the  cross-correlation  vector  r^^ei  maximizes  the 
noise  residual  in  hj*-  without  knowledge  of  the  noise  covariance  matrix.  The  next 
basis  generating  vector  is  then  chosen  to  be  the  vector  hj+i  =  rxiei/llrxieJI-  Now 
define  the  scalar  d,+i  =  h^^Xj.  The  Wiener  filter  g,+i,  of  dimension  i  +  1,  is  the 
optimal  linear  filter  which  estimates  do  from  the  vector  df+i  =  [  di  •  •  •  d,+i  ]  . 
The  error  at  each  stage  is  then  given  by  e,  =  do  —  g^d,-  This  stage- wise  max¬ 
imization  of  the  colored  noise  residual  combines  a  whitening  innovations  with  a 
correlation  operator  that  converges  to  the  colored  noise  matched  filter. 

The  algorithm  described  thus  far  determines  an  optimal  basis  which  has  been 
constructed  to  provide  the  best  estimate  of  the  residual  noise  at  each  stage.  Unfor¬ 
tunately  the  required  Wiener  filter  at  stage  i  is  a  vector  of  length  i.  A  simplificaton 
is  achieved,  however,  through  the  realization  that  the  output  can  be  mapped  to  the 
input  by  the  following  relation: 


l|r..e.||  =  ||rx*<i.||.  (2) 

The  result  of  (2)  unifies  the  decoupled  stages  of  the  innovations  procedure  while  si¬ 
multaneously  decomposing  the  vector  Wiener  filter  into  a  nested  multistage  Wiener 
filter  [15-18]  composed  only  of  scalar  weights  to,-  (see  Fig.  1).  The  ratio  of  the  out- 
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put  power  of  this  filter  to  the  output  noise  power  yields  an  identical  CFAR  test  to 

(1): 

|£^_js^R^^ 

TTo  S^R“^S  u 
no 

where  ttq  =  E  j^|eo|^foPj  •  This  filter  demonstrates  the  following  properties:  1)  The 
colored  noise  matched  filter  is  exactly  obtained  without  a  matrix  inversion;  2)  The 
multistage  Wiener  filter  generates  a  nonunitary  diagonalization  of  the  covariance 
where  the  diagonal  elements  are  mean-square  error  values;  3)  The  most  correlated 
signal  energy  is  represented  in  the  fewest  spectral  coefficients  of  this  decomposition; 
and  4)  Optimal  signal  compression  is  obtained  by  truncating  the  multistage  Wiener 
filter. 

An  example  is  now  considered  to  demonstrate  the  relative  performance  of  the 
multistage  Wiener  filter,  the  cross-spectral  metric  and  the  principal  components 
algorithms.  Consider  a  16  element  array  with  half-wavelength  spacing.  There  are 
5  jammers  present  whose  directions  of  arrival  are  -60°,  -30°,  -17°,  14°  and  34°, 
and  whose  signal-to-noise  ratios  in  dB  are  30,  32,  27,  30  and  29,  respectively.  The 
minimum  mean  square  error,  which  is  also  the  output  power  of  the  colored  noise 
matched  filter,  is  3.3440  dB  while  the  white  noise  matched  filter  output  power  is 
23.1077  dB.  The  convergence  as  a  function  of  rank  for  these  reduced-rank  processors 
then  explicitly  evaluates  the  relative  signal  representation  and  compression  capa- 


(3) 
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bility  of  the  algorithms.  The  results,  depicted  in  Fig.  2,  demonstrate  the  superior 
performance  of  the  multistage  Wiener  filter. 

4.  SUMMARY 

The  multistage  Wiener  filter  is  derived  from  a  new  optimization  procedure.  This 
procedure  directly  demonstrates  that  the  filter  simultaneously  performs  whitening 
via  an  innovations  process  and  correlation  on  the  whitened  data  in  a  stagewise 
manner.  This  result  is  interpreted  as  a  colored  noise  matched  filter  implementation 
and  more  clearly  explains  the  optimal  properties  of  the  filter  structure. 
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ABSTRACT 

The  Probabilistic  Least  Squares  Tracking  (PLST)  algorithm  is  a  recursive  way  of  estimating  both  the  states  and 
associations  of  mixture  models  and  a  Kalman  predictor/filter  version  of  this  algorithm  is  considered.  The  problem 
of  deinterleaving  superimposed  stable  pulse  trains  from  multiple  radars  can  be  formulated  as  a  mixture  process  in 
both  the  state  transition  and  measurement  models.  Using  a  simple  switched  version  of  the  PLST  Kalman  filter  the 
states  and  associations  can  be  estimated.  The  estimated  associations  are  the  indicator  variables  identifying  which 
pulse  is  associated  with  each  pulse  train  and  is  the  information  required  to  deinterleave  the  various  pulse  trains. 

1.  Introduction 

Consider  M  independent  sources  which  are  dynamically  varying  in  time  according  to  a  linear  state 
space  representation  and  a  set  of  measurements  where  at  each  time  instant  the  measurement 
comes  from  only  one  of  the  M  sources,  i.e.,  the  measurements  can  be  considered  samples  of  a 
mixture  of  the  M  sources.  The  Probabilisitc  Multi-Hypothesis  Tracking  (PMHT)  algorithm  of 
Streit  and  Luginbuhl  [1,2]  allows  the  estimation  of  both  the  states  and  the  measurement  to  model 
associations  by  a  clever  application  of  the  EM  algorithm.  An  alternative  is  to  use  a  least  squares 
approach;  and  a  Probabilistic  Least  Squares  Tracking  (PLST)  method  has  recently  been  proposed 
by  Krieg  and  Gray  [3,4,5]  and  compared  with  PMHT. 

An  important  electronic  warfare  problem  in  the  identification  of  transmitted  radar  signals  is  the 
deinterleaving  problem  -  this  is  the  problem  in  which  the  superimposition  of  a  series  of  pulses, 
each  series  being  termed  a  pulse  train,  from  different  radars  are  received  by  a  single  receiver  and 
the  times  of  arrival  of  each  pulse  are  then  accurately  measured.  The  problem  of  separating  the 
constituent  different  pulse  trains  given  just  the  time  of  arrival  of  each  pulse  is  known  as  the 
deinterleaving  problem.  The  times  of  arrival  (TOA’s)  of  stable  pulse  trains  (i.e.,  pulse  trains  with 
constant  pulse  repetition  intervals  (PRI))  can  be  formulated  as  a  state  space  representation  [6,7,8] 
and  some  approaches  to  the  deinterleaving  problem  using  this  approach  have  been  proposed  [8,9]. 

In  this  paper  we  modify  previous  formulations  of  this  problem  and  of  PLST  by  introducing  a 
mixtures  model  into  the  state  transition  equation  that  is  coupled  to  the  mixtures  model  for  the 
measurements.  This  coupling  is  through  the  associations  that  are  explicit  to  the  PLST  algorithm 
and  provide  a  simple  way  for  the  practical  implementation  of  recursive  PLS  Kalman  filters. 

The  results  of  the  application  of  this  approach  to  a  simple  three  pulse  train  deinterleaving  problem 
is  given. 
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2.  Kalman  Predictor  and  Filter  Estimation  -  PLS  Approach 

Consider  a  mixture  model,  where,  at  time  r*,  the  measurements  Zk  are  generated  by  one  of  M 
linear  models,  Le., 


Zk  =1 


M 


M 

k 


depending  on  which  model  generates  the  ^-th  data  point.  For  each  linear  model,  the  time  varying 
Hi  are  assumed  known  and  the  mi  represent  independent  noise  processes,  each  with  a  noise 
covariance  .  The  unknown  states,  ,  are  to  be  estimated  and  vary  according  to 

where  }  =  Qk^^^id 


In  the  PLS  approach  a  set  of  the  assignments,  a* ,  to  be  estimated  from  the  data  are  introduced  - 
the  al ,  are  zero  or  one  depending  on  whether  the  measurement  Zk  at  time  was  generated  by 
the  p‘^  source.  The  can  be  interpreted  as  the  probability  that  the  measurement  at  time 
originated  from  the  /?"■  source.  The  problem  of  estimating  both  the  unknown  states  and  the 
weights  ai  has  been  termed  probabilistic  least  squares  (PLS)  [3],  and  a  batch  algorithm  for  this 
was  presented  in  [4]  and  a  recursive  least  squares  was  derived  in  [5]. 

The  Kalman  filter  derivation  of  the  PLS  approach  is  to  minimise  the  following  expression  w.r.t  the 
unknown  states  and  associations. 

M  M 

4 =v.  +S£fer  £5 

p=l  /^1 

where 

S.i=Zk-H^2ik-  and  gi  = 

subject  to  the  constraint 

V  k. 


The  recursive  solutions  for  the  PLS  Kalman  predictor,  i*+i/*,  and  the  PLS  Kalman  filtered  output, 
i*/* ,  are  then  given  by 

A  p  T^P  ^  P 

2tk+\lk  —  Ek^k  Ik 


and 


^P  ^  _i_  I^P  f 

£k+l/k+l  —  +  ^k+l  \Zk^\ 


■H^k.AMk) 


where  the  Kalman  gain,  , 


is  given  by 

J^P  _  f^lfP  pp  jjpT  nP 
^k+l  ~  \^k+l!  ^k+Vk+l^k+l^k+\ 
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The  recursion  being  finally  completed  by  the  following  update  equations  for  the  error  covariances 
of  the  predicted  and  filtered  states. 


and 


P’  =  F'^  +0 

^k+llk  ^k^klk^k  ^ 


pP  ^  _  pP 
•*  *+!/*+!  “  ^k+llk 


-I 


Similarly  to  previous  approaches  the  estimated  associations  are  given  by 


As  is  their  wont,  the  equations  are  nonlinear  but  coupled  in  such  a  manner  that  allows,  at  each 
time  instant,  an  iterative,  but  computationally  demanding,  solution. 

3.  The  Deinterleaving  of  Stable  Pulse  Trains 

The  deinterleaving  problem,  discussed  earUer,  may  be  formulated  as  a  state  space  problem,  by 
defining  the  state  of  the  i-th  stable  pulse  train,  by 

,  r^i 


where  4^’  is  the  precise  time  of  last  occurrence  of  a  pulse  from  the  pulse  train  and  is  the 
pulse  repetition  interval  which  is  allowed  to  be  time  varying,  but  with  a  very  small  variance 
[6, 7,8,9]. 

To  allow  for  the  fact  that  k  is  the  received  pulse  number  index  rather  than  time,  a  switched 
transition  matrix  of  the  form 


^1  = 


Ff  =  -i 


1  11 
Lo  ij 
1  01 
0  ij 


if  the  pulse  at  time  h+i  originates  from  the  p'^source 
otherwise 


with  the  corresponding  measurement  covariances 


a  = 


r  <^JU.er 


Q2  = 


1 

01 

oj 


if  the  pulse  at  time  t^^j  originates  from  the  p**" source 
otherwise 


The  measurement  matrix  is  time  and  pulse  train  invariant  and  is  given  by  =  [l  0] 
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The  PLS  method  can  be  applied  in  either  a  recursive  Kalman  filter  or  batch  mode.  Here  we 
consider  the  recursive  mode  described  above  whereby  given  •^*1*,  a*  }■  we  use  the  Kalman  filter  to 
get  i*+i(t+i  and  directly  calculate  These  two  steps  are  coupled  through  the  error  terms  and 
must  be  done  iteratively  for  each  k.  This  necessitates  some  initialisation  which  is  detailed  below. 
The  switched  nature  of  the  transition  matrix  can  be  handled  by  hard  limiting  the  associations  and 
dynamically  using  them  to  determine  which  model  for  the  transition  matrix  should  be 
implemented. 

The  iteration  at  time  instant  tjt+i  noiay  be  summarised  as 

(1)  Initialise  the  associations  using  prediction  errors  £*+i(0)  = 

(2)  Calculate  the  a^+i(0)’s. 

(3)  For  n  =  1, 2, ...  till  convergence 

(3.1)  Update  the  covariances  and  gains  according  to 


=  where  are  determined  by  aL(«)- 

«■"..(«)  =  - 1))"  -  itHLPum 

(3.2)  Update  the  state  estimates  according  to 

(3.3)  Recalculate  the  association  estimates  according  to 

g*+i(n)  =  z*+i  and  consequently  al,(n) 

(3.4)  Form  "hard  assignments"  by 

find  v=  arg  max  {a;^^j(n)}  and  set  =  6p^. 


(4)  end  n  = 
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An  example  application  of  this  are  three  stable  pulse  trains  with  PRIs  and  start  times  of 
[l,V2+0.4  ,7i\[  and  [0,0.3, 4,1]  respectively.  The  initialisation  of  the  algorithm  used  the  actual 
measured  times  of  the  first  three  pulses  and  the  PRIs  are  randomly  set  within  2.5%  of  the  exact 
PRI.  The  tight  restriction  on  the  intialisation  of  the  PRIs  and  the  fact  that  the  number  of  pulse 
trains  is  assumed  known  need  further  investigation  but  could  be  overcome  by  first  preprocessing 
the  data  with  some  histogramming  technique.  The  covariances  were  initialised  by  setting 
5  0] 

0  OlJ  ^  assumed  model  noise  covariance  matrix.  Note  the  use  of  a 

high  variance  on  the  time  of  last  occurence  of  the  p-th  pulse  and  the  small  variance  on  the  PRIs. 


P’  = 

Mil 


At  each  time  update  of  the  Kalman  filter  the  iteration  described  above  was  repeated  10  times.  The 
times  of  arrival  were  jittered  by  adding  Gaussian  noise  of  standard  deviation  of  0.01  i.e.,  1%  of  the 
smallest  PRI,  and  32  data  samples  were  used. 

Plotted  below  is  a  sample  run  showing  the  difference  between  the  exact  and  the  estimated 
associations  using  the  PLST  Kalman  filter  just  running  forward  in  time.  Two  errors  were  made  at 
times  around  4  and  13. 


Plotted  below  is  a  sample  run  of  the  Kalman  filter  estimates  of  the  TOAs  of  the  three  pulse  trains 
and  their  PRIs  when  (a)  the  exact  associations  are  known  and  (b)  when  these  are  estimated  as 
outlined  above.  Close  agreement  was  obtained. 


(a)  Associations  known  exactly  (b)  Associations  estimated 

Results  averaged  over  an  ensemble  of  realisations  indicated  that  on  average  about  3  errors  per  32 
points  resulted.  At  the  same  time  the  corresponding  errors  in  the  state  estimates  were  very  small 
indicating  that  deinterleaving  rather  than  parameter  estimation  was  important  for  this  example. 
However  for  other  examples  the  converse  situation  can  occur. 

4.  Summary 
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The  problem  of  deinterleaving  pulse  trains  can  be  formulated  in  state  space  using  mixture  models 
for  both  the  measurement  and  state  transition  equations.  These  are  coupled  through  the  unknown 
set  of  associations.  A  simple  modification  of  the  PLST  algorithm  allows  this  to  be  recursively 
solved  using  a  Kalman  filter.  Smoothing  can  readily  be  incorporated  by  considering  a  batch 
process  and  could  show  a  significant  decrease  in  the  errors. 

Simulations  indicate  that,  provided  good  initialisation  can  be  obtained,  the  approach  does  show 
some  promise  although  many  issues  associated  with  initialisation  need  resolution. 

5.  References 

[1]  R  L  Streit  and  T  E  Luginbuhl  Probabilistic  Multi-Hypothesis  Tracking  Technical  Report  TR- 
10,428,  NUWC  Newport  RI  1995. 

[2] R  L  Streit  and  T  E  Luginbuhl  A  Probabilistic  Multi-Hypothesis  Tracking  Algorithm  without 
Enumeration  and  Pruning  Proceedings  of  the  Sixth  Joint  Service  Data  Fusion  Symposium, 
Laurel,  Maryland,  14-18  June  1993, 1015-1024. 

[3]  M.L.  Krieg  and  D.A.  Gray  Comparison  of  probabilistic  least  squares  and  probabilistic  multi¬ 
hypothesis  tracking  algorithms  for  multi-sensor  tracking  Proc.  IEEE  Int.  Conf.  on  Acoustics, 
Speech  and  Signal  Processing  (ICASSP-97),  Vol.  1,  Munich  Germany,  515-518. 

[4] .D.  A.  Gray,  M.L.  Krieg  and  M.  R.  Peake  Estimation  of  the  Parameters  of  Mixed  Processes  by 
Least  Squares  Optimisation  Proceedings  of  Fourth  International  Conference  on  Optimization; 
Techniques  and  Applications  (ICOTA  98)  Vol  1,  Perth  Australia,  July  1998,  891-898. 

[5]  D  A  Gray  and  M  L  Krieg  Recursive  Least  Squares  and  Kalman  Filtering  Approaches  to 
Tracking  the  Parameters  of  Mixture  Models  Proceedings  of  Workshop  commun  GdR  ISIS  (GTl) 
and  NUWC  "Approches  Probabilistes  Multipistes  pour  lExtraction  Multipistes  Paris  Nov  1998 
paper  5. 

[6]  E  T  Kofler  and  C  T  Leondes  New  Approach  to  the  Pulse  Train  Deinterleaving  Problem 
Intemation  Journal  on  Systems  Science  20(12),  2663-2671, 1989 

[7]  D.A.  Gray,  B.L.  Slocumb  and  S.D.  Elton,  Parameter  Estimation  for  Periodic  Discrete  Event 
Processes  Proceedings  Int  Conf  on  Acoustics  Speech  and  Signal  Processing,  vol  4,  April  1994, 
Adelaide,  Australia,  93-96. 

[8]  J  B  Moore  and  V  Krishnamurthy  Deinterleaving  Pulse  trains  using  Discrete-Time  Stochastic 
Dynamic-Linear  Models  IEEE  Trans  on  Signal  Processing,  vol  42,  Noll,  Nov  1994  3092-3103. 

[9]  B  J  Slocumb  and  E  W  Kamen  The  Pulse  train  PDA  Analysis  and  Deinterleaving  Filter  SPIE 
vol  3068  296-307. 


74 


Array  Element  Localisation 
Using  Simulated  Annealing 


Michael  V.  Greening 

Defence  Science  and  Technology  Organisation, 
Salisbury  Site,  MOD  Building  79,  P.O.  Box  1500, 
Salisbury,  S.A.,  5108,  Australia 


Abstract 

Array  processing  techniques  such  as  beamforming  or  matched  field  processing  require 
accurate  knowledge  of  the  location  of  individual  elements  in  the  array.  For  horizontal 
arrays  laid  on  the  ocean  floor,  relative  arrival  times  measured  across  the  array  from 
nearby  implosive  sources  are  often  used  to  aid  in  estimating  the  sensor  positions. 
However,  the  inverse  problem  of  determining  the  sensor  positions  from  the  relative 
arrival  times  is  both  nonunique  and  ill-conditioned.  This  paper  shows  how  simulated 
annealing  can  be  used  to  solve  this  inverse  problem.  Synthetic  studies  show  that 
relative  sensor  locations  can  be  exactly  found  while  tests  with  real  data  show  an 
improvement  in  array  gain  comparable  to  the  theoretical  limit  obtained  from  a 
perfectly  known  array. 


Introduction 

Remotely  deployed  systems  often  contain  horizontal  or  vertical  arrays  mounted 
on  the  ocean  floor  and  used  to  acoustically  monitor  areas  of  the  ocean.  One  prob¬ 
lem  with  remotely  deployed  systems  is  accurately  determining  the  sensor  positions 
in  the  array.  Conventional  beamforming  is  often  considered  to  require  sensor  po¬ 
sition  estimates  accurate  to  within  A/ 10  where  A  is  the  wavelength  of  the  signal 
measured.^  More  advanced  array  processing  techniques  such  as  adaptive  beamform¬ 
ing  or  matched  field  processing  require  even  more  accurate  estimates  of  the  sensor 
positions. 

One  technique  often  used  to  help  estimate  the  sensor  positions  in  remote  systems 
is  to  employ  transient  sources  near  the  array  and  measure  the  arrival  times  of  the 
signals  across  the  array.  If  the  location  of  the  sources  and  the  travel  times  to 
the  sensors  are  known,  then  the  location  of  all  the  sensors  in  the  array  can  be 
unambiguously  determined  using  triangulation  from  three  sources.  However,  the 
source  locations  are  often  only  known  approximately  and  the  travel  times  from 
source  to  sensor  are  often  unknown  with  the  arrival  times  at  any  sensor  only  known 
relative  to  the  arrival  times  at  other  sensors.  Unknown  source  positions  can  allow 
for  a  rotation  or  translation  of  the  combined  system  of  sensors  and  sources  without 
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a  change  in  the  arrival  times.  Also,  a  given  set  of  relative  arrival  times  for  a  single 
source  can  be  exactly  reproduced  by  decreasing  the  source  range  while  increasing  the 
curvature  of  the  array.  Thus,  the  inverse  problem  of  estimating  the  sensor  positions 
from  relative  arrival  times  and  with  unknown  source  locations  is  both  nonunique 
and  ill-conditioned.  The  problem  involves  searching  a  multidimensional  space  of 
estimated  sensor  and  source  positions  to  minimize  the  error  between  the  measured 
and  predicted  relative  arrival  times.  One  technique  designed  specifically  to  search 
ill-conditioned,  multidimensional  spaces  is  called  simulated  annealing.^  This  paper 
will  show  how  to  apply  this  technique  to  the  problem  of  locating  the  sensors  in  a 
remotely  deployed  system.  Both  synthetic  and  real  data  will  be  examined  and  some 
recommendations  on  the  number  and  location  of  sources  will  be  given. 


I  Methodology 

This  section  shows  how  to  apply  simulated  annealing  to  the  problem  of  locating  a 
horizontal  array  mounted  on  the  ocean  floor.  For  the  real  data,  a  set  of  light  bulbs® 
imploded  near  mid-depth  was  used  as  transient  sources  and  the  relative  arrival 
times  of  the  direct  arrival  and  surface  reflection  were  measured  across  the  array. 
The  problem  then  is  to  use  simulated  annealing  to  find  a  set  of  source  and  sensor 
locations  which  will  reproduce  the  relative  arrival  times. 

Simulated  annealing  involves  a  series  of  iterations  in  which  the  unknown  pa¬ 
rameters  (ie.  source  and  sensor  locations)  are  perturbed.  For  each  iteration,  the 
relative  arrival  times  of  the  direct  arrival  and  surface  reflection  are  calculated  for 
the  modelled  parameters.  The  modelled  arrival  times  are  then  compared  with  the 
measured  arrival  times  and  the  total  time  error  E  is  given  as  an  estimate  of  the 
goodness  of  fit  of  the  modelled  source  and  sensor  positions  to  their  true  values.  For 
successive  iterations,  the  change  in  error  AF?  is  calculated.  If  the  error  has  decreased 
{AE  <  0),  the  new  parameter  configuration  is  accepted.  If  the  error  has  increased 
{AE  >  0),  the  new  configuration  has  a  probability  P  of  being  accepted  with  the 
probability  being  drawn  from  the  Boltzmann  distribution: 

P{AE)  =  exp(-A£;/T),  (1) 

where  T  is  a  controlling  parameter  analoguous  to  temperature  in  the  physical  process 
of  annealing.  Accepting  some  perturbations  which  increase  E  allows  the  algorithm 
to  escape  from  local  suboptimal  minima  in  the  search  space.  Decreasing  T  with 
successive  iterations  decreases  the  probability  of  accepting  an  increase  in  error,  and 
the  algorithm  eventually  converges  to  a  solution  which  should  approximate  the  global 
minimum. 

Two  factors  involved  in  developing  an  efficient  and  effective  simulated  annealing 
algorithm  are  the  method  of  decreasing  the  temperature  T,  and  the  method  of 
perturbing  the  unknown  parameters.  A  starting  temperature  Tq  was  chosen  which 
allows  at  least  90%  of  all  perturbations  to  be  accepted.  A  number  of  perturbations 
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r)  are  then  performed  before  decreasing  the  temperature  according  to  Tj+i  =  aTj, 
where  a  <  1.  The  process  is  terminated  when  further  temperature  steps  do  not 
result  in  a  lower  error.  The  values  of  r]  and  a  to  use  depend  on  the  diflEiculty  of  the 
inversion.  Increasing  both  77  and  a  should  decrease  the  final  error  but  also  increases 
the  number  of  iterations  and  time  required.  An  estimate  of  77  and  a  can  be  obtained 
by  using  synthetic  data  and  chosing  77  and  a  large  enough  that  the  final  error  is 
zero  or  the  resulting  sensor  locations  are  accurate  to  within  an  acceptable  tolerance. 
With  real  data,  77  and  a  can  be  initialized  to  the  values  obtained  from  the  synthetic 
study  and  then  increased  until  a  further  increase  in  their  values  does  not  decrease 
the  final  error. 

The  method  of  perturbing  the  parameters  can  have  a  major  effect  on  the  effi¬ 
ciency  of  simulated  annealing.  Changing  only  one  parameter  at  a  time  allows  the 
algorithm  to  converge  for  a  sensitive  parameter  while  continuing  to  search  for  less 
sensitive  parameters.  Changing  multiple  parameters  in  one  perturbation  allows  for 
quicker  convergence  when  coupled  parameters  are  involved  and  also  allows  for  easier 
jumping  between  local  minima.  For  our  problem,  the  unknown  parameters  are  the 
source  and  sensor  locations  along  with  the  bottom  depth,  which  was  assumed  to 
be  known  only  within  20  m  for  the  purposes  of  the  study.  For  every  perturbation 
a  source,  sensor  or  bottom  depth  is  randomly  chosen  to  be  changed.  If  a  source  is 
picked,  the  position  of  only  a  single  source  is  changed  for  a  given  perturbation.  If 
a  sensor  is  picked,  either  a  single  sensor,  multiple  sensors  or  the  entire  array  can 
be  changed  in  the  following  manner.  The  entire  array  can  be  changed  by  shifting 
it  horizontally  or  by  rotating  it  about  some  angle.  A  single  sensor  can  be  changed 
by  changing  its  distance  or  bearing  from  the  previous  sensor  without  moving  other 
sensors,  providing  that  the  separation  between  pairs  of  sensors  does  not  exceed  the 
length  of  cable  joining  them.  Multiple  sensors  can  also  be  changed  by  moving  all 
sensors  before  or  after  the  sensor  picked  as  above  by  the  same  change  given  to  that 
sensor.  Finally,  when  changing  a  parameter,  it  is  changed  in  one  of  two  ways.  Either 
a  new  value  is  picked  within  a  Gaussian  distribution  centered  on  the  current  value 
or  a  new  random  value  is  chosen  from  the  entire  allowable  range  for  that  parameter. 

After  the  simulated  annealing  algorithm  stops,  a  gradient  descent  algorithm  was 
applied  using  the  result  of  the  simulated  annealing  as  the  initial  estimate.  This  is 
used  to  ensure  that  the  absolute  minimum  of  the  current  trough  is  found  and  can 
help  to  speed  up  simulated  annealing  after  a  suitably  cool  temperature  is  reached. 


II  Results 

The  simulated  annealing  algorithm  for  array  element  localisation  was  tested  on 
both  real  and  synthetic  data.  The  real  data  was  collected  at  the  RDS-2  trial  in 
November  1998  in  110  m  water  in  the  Timor  Sea  off  the  northern  coast  of  Australia. 
The  data  examined  was  collected  on  the  ULITE  array  deployed  by  the  Space  and 
Naval  Warfare  Systems  Center  (SPAWAR),  San  Diego,  CA.,  USA. 
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The  planned  deployment  of  the  ULITE  array  and  light  bulbs  is  shown  in  Fig.  1. 
The  array  consists  of  three  arms  of  32  sensors  tied  in  the  center  and  with  each  arm 
containing  a  slight  curvature  to  break  the  left/right  symmetry  from  the  arm.  The 
planned  light  bulb  locations  were  at  mid-depth  with  three  light  bulbs  on  each  side 
of  each  arm,  approximately  100  m  distance  from  the  arm.  Using  Fig.  1  to  generate 
synthetic  data,  the  simulated  annealing  algorithm  was  tested  with  the  following 
uncertainties:  the  center  of  the  array  was  assumed  to  be  known  to  within  100  m; 
the  bearing  or  range  to  any  sensor  was  unrestricted  except  that  the  range  between 
a  pair  of  sensors  could  not  be  greater  than  the  length  of  cable  separating  the  pair; 
the  horizontal  position  of  a  light  bulb  was  assumed  to  be  known  within  100  m; 
the  depth  of  a  light  bulb  was  assumed  to  be  known  only  within  20  m;  the  water 
depth  was  assumed  to  be  known  within  20  m.  With  the  above  uncertainties,  if  the 
relative  arrival  times  were  known  exactly  (ie.  not  digitised),  then  the  relative  array 
shape  and  light  bulb  positions  were  found  exactly.  Absolute  positions  could  shift  or 
rotate  as  long  as  all  light  bulb  positions  stayed  within  the  allowed  uncertainty.  If 
the  relative  times  were  only  known  within  a  digitisation  sample,  then  the  relative 
position  of  any  sensor  could  shift  from  its  true  relative  position  by  as  much  as  the 
distance  travelled  by  sound  within  the  time  of  the  digitisation  sample.  Increasing  the 
number  of  light  bulbs  decreases  the  positional  shift  introduced  by  the  digitisation. 
The  relative  arrival  times  of  the  direct  arrival  and  bottom  reflection  for  the  light 
bulb  at  (100  m,  300  m)  is  shown  in  Fig.  2. 


Figure  1:  Trial  layout  plan  Figure  2:  Relative  arrival  times 

The  estimated  shape  of  the  ULITE  array  is  shown  in  Fig.  3  along  with  the 
shape  estimated  by  SPAWAR.  Although  ground  truth  is  unavailable,  the  similarity  of 
shape  provides  some  confidence  that  the  correct  shape  was  obtained.  The  SPAWAR 
estimate  assumed  fixed  light  bulb  locations  at  mid-depth  and  at  the  recorded  GPS 
positions.  The  simulated  annealing  technique  used  at  DSTO  allowed  uncertainties 
of  100  m  in  the  horizontal  location,  and  20  m  in  the  depth  of  a  light  bulb.  This 
makes  the  technique  more  robust  to  errors  in  estimated  light  bulb  positions  and 
allows  for  drift  in  the  light  bulb  as  it  is  lowered  to  depth.  For  individual  arms  of 
the  array,  the  relative  shapes  estimated  by  DSTO  and  SPAWAR  are  very  similar  as 


shown  in  Fig.  4.  Although  the  relative  shapes  of  individual  arms  are  very  similar, 
the  relative  shapes  of  the  entire  array  show  a  difference  of  a  2°  rotation  in  the 
direction  of  the  northward  pointing  arm  relative  to  the  other  two  arms.  The  reason 
for  the  difference  between  the  two  estimated  shapes  is  believed  to  be  caused  by  the 
location  of  the  light  bulbs,  which  were  not  as  tightly  concentrated  along  the  arms 
of  the  array  as  in  the  plan.  Consider  a  light  bulb  which  is  near  endfire  to  one  arm 
and  near  broadside  to  another.  A  shift  in  the  light  bulb  position  can  cause  a  large 
change  in  the  relative  arrival  times  across  the  broadside  array  but  very  little  change 
across  the  endfire  array.  Thus,  having  many  light  bulbs  near  endfire  of  one  arm 
can  cause  a  relative  shift  in  the  heading  between  two  arms  of  the  array  with  little 
difference  in  the  relative  arrival  times.  Determining  the  number  and  location  of  light 
bulbs  required  to  accurately  estimate  an  array  shape  is  array  dependent.  A  study 
of  a  single,  nearly  linear,  array  of  200  m  length  showed  that  four  light  bulbs  with 
two  along  one  side,  one  along  the  other  side  and  one  near  endfire  always  provided 
solutions  accurate  to  within  the  digitisation  rate,  provided  that  the  light  bulbs  were 
within  200  m  of  the  array. 


Figure  3:  Estimated  trial  layout  Figure  4:  Estimated  shapes  of  array  arms 

The  errors  between  the  measured  and  modelled  arrival  times  are  shown  in  Fig.  5. 
Each  horizontal  line  shows  the  error  in  arrival  times  for  both  the  direct  arrival  and 
surface  reflection  from  all  light  bulbs.  Nearly  all  errors  fall  within  the  digitisation 
sample  size  of  0.002  sec,  providing  further  confidence  that  the  true  array  shape  is 
well  approximated.  Sensor  32  is  the  outermost  sensor  on  the  northward  pointing 
arm  of  the  array  and  was  connected  to  a  surface  buoy.  It  is  believed  that  the  buoy 
was  causing  this  sensor  to  move  and  thus,  an  accurate  estimate  of  its  position  could 
not  be  found,  and  it  contains  large  errors  in  the  arrival  time  estimates.  This  was 
also  found  by  SPAWAR.  This  sensor  is  not  plotted  in  Fig.  3. 

A  final  indication  of  the  accuracy  of  the  estimated  array  shape  can  be  obtained 
by  finding  the  array  gain  provided  when  using  the  estimated  shape  in  beamforming. 
Beamforming  was  performed  using  the  eastward  pointing  arm  on  an  80  Hz  tonal 
target  at  100°  relative  to  North.  Only  the  16  sensors  that  were  cut  for  24  Hz  are 
used.  The  beamformed  output  is  shown  in  Fig.  6  and  indicates  an  approximate 
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Figure  5:  Arrival  time  errors  Figure  6:  Beamformed  output  of  east  arm 

6  dB  gain  when  the  estimated  array  shape  is  used  compared  to  a  linear  array.  This 
matches  the  theoretical  loss  for  a  signal  arriving  on  the  curved  array  but  processed 
assuming  a  straight  array.  Again,  this  is  a  strong  indication  that  the  true  array 
shape  is  well  represented. 

Ill  Summary 

This  paper  has  shown  how  simulated  annealing  can  be  used  to  accurately  per¬ 
form  array  element  localisation  on  remotely  deployed  systems.  Synthetic  studies 
have  shown  that  the  relative  sensor  positions  can  be  determined  within  the  accu¬ 
racy  defined  by  the  digitisation  rate  if  suflScient  light  bulbs  are  employed  nearby 
along  the  array  and  at  endfire  to  the  array.  Tests  with  real  data  agreed  well  with 
an  independently  performed  localisation  estimate,  and  also  improved  the  response 
of  conventional  beamforming  to  approximately  the  theoretical  limit. 
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We  describe  an  algorithm  for  finding  Paxeto-optimal  paths  in  a  multi¬ 
criteria  shortest  path  problem.  We  use  the  algorithm  to  find  approximate 
solutions  to  the  problem  of  guiding  a  mobile  object  such  as  a  submarine 
from  one  location  to  another,  through  a  field  of  sensors  at  known  positions, 
within  a  fixed  time  period  and  with  minimum  probability  of  detection. 


1.  INTRODUCTION 

Our  goal  is  to  guide  a  submarine  from  one  location  to  another,  through  a  field  of 
sensors  at  known  positions,  within  a  fixed  time  period  and  with  minimum  proba¬ 
bility  of  detection.  The  techniques  presented  here  can  be  used  in  other  applications 
-  for  example,  to  find  approximately  optimal  flight  paths  for  aircraft  [7],  or  to  plan 
paths  requiring  obstacle  avoidance  in  robotics  [6].  The  mathematical  problem  that 
underpins  our  approach  is  a  discrete  one  involving  undirected  graphs,  and  we  first 
discuss  that. 


2.  A  MULTICRITERIA  OPTIMISATION  PROBLEM 

We  are  given  a  directed  graph  Q  =  [V,  £],  where  V  and  £  are  the  finite  sets 
of  vertices  and  edges  respectively,  and  specified  start  vertex  s  and  target  vertex 
t.  We  allow  more  than  one  edge  between  a  pair  of  vertices,  so  that  we  will  be 
able  to  incorporate  multiple  speed  options  in  our  application.  We  are  also  given 
k  non-negative  functions  fi,f2,—,fk  defined  on  S,  where  for  each  j  =  1,2, 
the  number  fj{e)  represents  a  cost  associated  with  the  edge  e  €  €,  such  as  a 
measurement  of  the  time  required  to  traverse  e  or  of  the  danger  of  being  detected 
whilst  on  €. 

A  path  from  vertex  v  to  vertex  v'  in  Q  is  usually  described  as  a  sequence  of  edges 
p  =  (ei ,  62,  Cr-i),  where  ej  is  an  edge  joining  Vi  to  Uj+i  for  i  =  1,2, ...,  r  —  1,  and 
v\  =  V  and  Vr  =  v'.  We  only  consider  paths  that  start  at  s,  and  so  we  say  that  a 
path  is  a  path  to  the  vertex  v  if  it  is  a  path  from  s  to  v.  We  assume  that  there 
is  at  least  one  path  to  every  vertex  in  V,  and  include  here  the  ‘trivial  path  to  s’ 
containing  only  the  vertex  s  as  this  is  needed  to  initiate  our  algorithm. 

The  classical  shortest  path  problem  arises  when  there  is  only  one  cost  function, 
and  the  task  is  to  find  a  path  p  from  s  to  t  with  minimal  cost  ci(p)  = 
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where  the  sum  is  taken  over  all  those  edges  e  that  form  p.  There  are  many  efficient 
polynomial  time  algorithms,  such  as  Dijkstra’s  algorithm,  for  doing  this. 

In  the  multicriteria  shortest  path  problem  (where  A;  >  1),  we  let  Cj{p)  —  Yleev 
for  each  j,  and  C{p)  =  (ci(p),C2(p),  ...,Ck{p))  be  the  cost  vector  associated  with  the 
path  p.  There  is  a  natural  partial  order  on  cost  vectors  defined  by  C(p)  <  C(p')  if 
and  only  if  cj(p)  <  Cj(p')  for  j  =  1, k.  We  say  that  a  path  p  to  a  vertex  v  domi¬ 
nates  another  path  p'  to  the  same  vertex  if  C(p)  <  C(p')-  As  there  may  be  no  single 
dominant  path  to  t,  the  problem  is  one  of  finding  Pareto-optimal  paths,  which  are 
those  that  are  not  dominated  by  any  other  path;  that  is,  those  corresponding  to 
minimal  cost  vectors.  Each  such  path  is  optimal  in  that  no  improvement  can  be 
achieved  in  one  cost  function,  without  worsening  at  least  one  of  the  others. 

The  task  of  determining  all  Pareto-optimal  paths  is  computationally  complex. 
Typically,  ad  hoc  single  Pareto-optimal  paths  are  found  using  methods  such  as 
the  weighted  sum  approach,  the  e-constraint  method  and  goal  programming.  Ge¬ 
netic  algorithms  have  also  been  used,  but  their  effectiveness  is  closely  tied  to  the 
method  used  to  assign  fitness  [1].  Recent  literature  also  describes  algorithms  that 
simultaneously  find  all  Pareto-optimal  paths  [2], [3], [5]. 

In  practice  there  may  be  more  than  one  path  to  a  specified  vertex  corresponding 
to  a  given  cost  vector  C{p),  and  we  say  that  all  such  paths  are  equivalent.  For  many 
applications  it  is  sufficient  to  determine  just  one  Pareto-optimal  path  to  t  for  each 
minimal  cost  vector.  Tung  and  Chew  [8], [9]  give  such  an  algorithm  that  does  this 
for  simple  graphs.  In  the  next  section  we  describe  modifications  to  this  algorithm 
that  improve  its  efficiency  and  make  it  appropriate  for  non-simple  graphs. 

3.  THE  PARETO  ALGORITHM 

The  basic  Pareto  algorithm  determines  one  Pareto-optimal  path  to  t  for  each 
minimal  cost  vector.  It  iteratively  generates  paths  from  the  start  vertex  s.  At  each 
iteration  it  selects  a  path  p  =  (ei,  62, ...,  e^-i),  the  test  path,  to  some  vertex  v', 
and  analyses  all  one-edge  extensions  [p  :  e]  =  (ei,e2,...,er-i,e),  where  e  is  an  edge 
joining  v'  to  a  neighbouring  vertex  v.  It  uses  pruning  criteria  to  reject  an  extension 
as  inadmissable  if  it  is  either  dominated  by  a  known  path  to  v,  or  if  it  can  be  shown 
that  each  possible  completion  of  [p  :  e]  to  a  path  to  t  is  equivalent  to,  or  dominated 
by,  one  already  found.  The  latter  is  decided  using  a  heuristic  that  provides  a  lower 
bound  for  the  cost  of  such  completions. 

The  algorithm  uses  a  vertex  labelling  scheme  to  keep  track  of  possible  test  paths 
at  each  iteration,  and  selects  from  amongst  these  on  the  basis  of  a  selection  function 
that  incorporates  the  heuristic. 

3.1.  Components  of  the  algorithm 

We  briefly  discuss  the  important  selection  function  and  pruning  criteria  before 
describing  the  Pareto-algorithm. 

3.1.1.  Selection  function 

Although  only  one  selection  function  is  required  for  the  basic  Pareto-algorithm, 
we  introduce  a  family  of  such  functions  that  may  be  used  in  a  later  modification  of 
the  algorithm. 
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A  selection  function  assigns  a  numerical  value  to  each  potential  test  path.  To  see 
how,  we  first  let  a  =  {a\,a2,  —,otk)  denote  a  vector  of  non-negative  weights  such 
that  tra  =  1,  where  the  trace  tr  of  a  vector  is  the  sum  of  its  components.  The 
special  weight  vectors  where  j  —  1, ...,  k,  for  which  the  j*'*  coordinate  is  1  and 
the  others  all  equal  0,  and  positive  weight  vectors  a,  for  which  each  Oj  is  positive, 
will  be  important. 

For  each  a  and  each  path  p,  we  let  Wa{p)  =  aiCi(p)  -I- ...  -t-  akCkip),  and  for 
each  vertex  v,  ha{v)  =  minpWo(p),  where  the  minimum  is  taken  over  all  paths 
from  V  to  t.  (In  particular,  Wgu)  =  Cj  for  each  j.)  Note  that  the  weights  reflect 
the  relative  influence  of  the  cost  functions  only  if  the  cost  functions  fj  have  similar 
numerical  ranges.  The  values  hdv)  are  easily  computed  by  applying  Dijkstra’s 
algorithm  to  the  simple  graph  Q'  =  [V,  £^'],  obtained  by  putting  an  edge  between 
two  vertices  v'  and  v  whenever  there  is  at  least  one  edge  between  them  in  Q,  and 
weighting  that  edge  by  the  minimum  of  where  the  minimum  is  taken  over 

all  edges  between  v'  and  v  in  Q.  We  let  pa  denote  the  path  constructed  to  t  for 
which  WaiPa)  =  ha{s).  We  can  assume  that  pa  is  Pareto-optimal,  although  this 
will  require  us  to  modify  Dijkstra’s  algorithm  if  some  of  the  aj  equal  0. 

For  each  positive  weight  vector  a,  we  define  the  selection  value  Sa{p)  of  a  path  p  to 
a  vertex  r;  to  be  5a (p)  —  Wq  {p)+ha{v).  It  gives  a  lower  bound  on  the  corresponding 
weighted  sum  of  costs  of  paths  to  t  that  are  extensions  of  p.  The  Pareto  algorithm 
uses  such  a  selection  function,  choosing  a  test  path  at  each  iteration  from  amongst 
those  known  paths  with  minimum  selection  value.  It  is  easy  to  check  that  the 
positivity  of  a  ensures  that  a  selected  test  path  cannot  be  dominated  by  any  other 
path  to  the  same  vertex,  and  so  must  be  Pareto-optimal. 

3.1.2.  Pruning  criteria 

Clearly  the  number  of  potential  test  paths  may  grow  exponentially  with  the 
number  of  vertices  and  edges  in  Q,  and  so  it  is  essential  to  eliminate  unnecessary 
paths  as  soon  as  possible. 

For  each  vertex  v  and  integer  1  <  j  <  A:,  let  qj{v)  =  minpCj(p),  where  the 
minimum  is  taken  over  all  paths  from  v  to  the  target  vertex  t;  that  is,  qj{v)  = 
hgu)iv).  Furthermore,  let  Q(n)  =  {qi{v),q2{v),  ...,qk{v)).  Then  an  extension  [p  :  e] 
of  the  current  test  path  p  to  v'  to  vertex  v  is  rejected,  in  the  sense  that  it  is  not 
added  to  the  set  of  possible  test  paths,  if  either 

(PI)  C([p  :  e])  >  C(p')  for  some  known  path  p'  to  v,  or 

(P2)  C([p  :  e])  +  Q{v)  >  C(p'),  where  p'  is  a  known  path  to  t. 

The  first  condition  identifies  extensions  to  v  that  are  dominated  by  a  known  path 
to  V.  The  second  identifies  extensions  that  can  at  best  be  completed  to  paths  that 
are  equivalent  to  or  dominated  by  known  ones.  This  is  a  valid  rejection  criterion 
because  we  only  want  one  path  from  each  equivalence  class  of  Pareto-optimal  paths. 

Let  Fo  be  the  set  of  Pareto-optimal  paths  constructed  to  t  when  calculating 
Q{t)  and  hais)  for  the  chosen  a.  We  can  use  Fq  to  seed  the  pruning  process.  As 
Dijkstra’s  algorithm  is  quadratic,  it  may  be  worthwhile  generating  a  reasonably 
large  set  Fq  by  including  paths  pa  for  several  choices  of  a. 
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3.2.  The  steps  of  the  Pareto  algorithm 
We  briefly  describe  the  essential  steps  of  the  Pareto  algorithm  as  follows: 

A1  Fix  a  selection  function  Sa  (most  commonly  a  =  {l/k,l/k,  ...,1/k)),  and 
use  Dijkstra’s  algorithm  to  calculate  the  corresponding  values  haiv),  and  Q(n),  for 
each  vertex  v.  Generate  additional  paths  in  Fq  if  desired. 

A2  Let  S  denote  the  set  of  possible  test  paths,  and  F  the  set  of  known  Pareto- 
optimal  paths  to  t,  at  the  current  iteration  The  algorithm  is  initialised  by  setting 
S  equal  to  the  trivial  path  (that  is,  the  path  with  no  edge)  and  F  =  Fq. 

A3  Pick  a  test  path  p  by  finding  the  path  in  S  that  has  the  minimum  selection 
value.  If  there  are  no  such  paths,  stop.  Remove  p  from  E. 

A4  For  each  one-edge  extension  [p  :  e]  of  p, 

(i)  reject  [p  :  e]  if  it  satisfies  either  (PI)  or  (P2),  with  p'  6  F  ,  or 

(ii)  if  [p  :  e]  is  an  admissible  path  to  t,  add  [p  :  e]  to  F,  and  go  to  step  A5,  or 

(iii)  if  [p  :  e]  is  an  admissible  path  to  a  vertex  other  than  t,  calculate  its  selection 
value  and  add  [p  :  e]  to  S. 

Go  to  step  A3. 

A5  For  each  path  p'  in  E  to  vertex  v,  reject  p'  if  C(p')  4-  Q(n)  >  C([p  :  e]). 
Return  to  step  A4. 

When  the  algorithm  stops,  F  contains  a  complete  set  of  inequivalent  Pareto- 
optimal  paths,  one  for  each  cost  vector.  They  have  been  found  in  increasing  order 
of  the  selection  values. 

It  is  worth  mentioning  the  labelling  scheme  that  keeps  track  of  the  paths  that 
are  generated.  Each  vertex  i;  ^  s  is  given  a  finite  sequence  of  labels  9^{v),9'^{v), ... 
,  where  each  label  represents  a  path  to  v,  and  where  0"(n)  is  assigned  to  v  the 
n*'*  time  it  is  used  to  produce  an  admissible  one-edge  extension  of  a  test  path. 
The  labels  are  assigned  iteratively  in  the  following  sense.  Suppose  that  the  vertex 
V  is  used  for  the  time  when  an  edge  e  from  v'  to  v  is  added  to  a  test  path 
p  to  produce  an  admissible  extension  [p:  e],  and  suppose  that  9^{v')  is  the  label 
corresponding  to  p.  Then  we  set  0"(u)  =  [e,i,  C([p  :  e])].  The  third  component 
C([p  :  e])  is  not  used  to  specify  a  path,  but  just  to  record  the  cost  of  [p  :  ej.  The 
scheme  is  initialised  by  setting  9^{s)  =  [-,  -,0]. 

3.3.  Locating  special  Pareto-optimal  paths 
We  describe  modifications  to  the  Pareto  algorithm  that  may  be  used  when  it  is 
sufficient  to  find  Pareto-optimal  paths  to  t  that  satisfy  certain  cost  restrictions,  as 
will  be  the  case  in  our  application. 

Suppose  firstly  that  we  wish  to  locate  only  Pareto-optimal  paths  p'  that  satisfy 
prescribed  bounds  on  some  or  all  of  their  costs.  For  example,  we  may  require 
Cm{p')  <  Tm  for  some  m’s,  where  the  give  the  bounds.  Then  the  pruning 
criteria  should  be  modified  by  adding:  a  one-edge  extension  [p  :  e]  of  the  test  path 
p  to  vertex  v  is  rejected  if 

(P3)  Cm([p  :  e])  +  g'm(n)  >  for  the  relevant  m. 

If  such  bounds  are  given  for  every  m  then  Wa{p')  <  oiTi  +  ...  +  ak^k  for  every 
weight  vector  a  -  (oi,  ...,0*).  This  suggests  an  additional  pruning  criterion:  that 
[p  :  e]  be  pruned  if 
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(P4)  :  e])  >  aiTi  +  ...  +  a* t*, where  a  is  the  index  of  the  selection  function 

in  use. 

In  our  application,  we  will  seek  a  Pareto-optimal  path  to  t  of  least  danger  for 
which  the  time  cost  is  at  most  some  prescribed  amount.  Such  a  path  can  be  found 
efficiently  by  updating  the  right  hand  side  of  (P3)  and  (P4)  on  the  basis  of  the  least 
dangerous  known  Pareto-optimal  path  which  does  not  exceed  the  time  limit. 

4.  APPLICATION  TO  SUBMARINE  TRANSIT  PATHS 

As  stated  earlier,  the  aim  is  to  direct  a  submarine  from  one  location  to  another, 
through  a  field  of  sensors  at  known  positions,  within  a  fixed  time  period  and  with 
minimum  probability  of  detection.  If  we  make  a  number  of  simplifying  assumptions, 
we  can  use  the  graph  searching  techniques  described  in  previous  sections  to  produce 
a  near-optimal  path. 


4.1.  Discrete  formulation 

For  simplicity,  we  assume  that  the  problem  is  two  dimensional,  in  the  sense  that 
the  object  is  constrained  to  move  in  a  bounded  two  dimensional  region  called  the 
transit  region.  We  place  a  grid  on  the  transit  region  so  that  the  start  and  goal 
locations  correspond  to  grid  points,  and  assume  that  the  grid  size  is  sufficiently 
large  (for  a  submarine,  no  less  than  Ifcm)  that  we  may  ignore  such  complications 
as  ‘turning  circles’.  We  also  assume  that  the  object  is  constrained  to  move  in  a 
straight  line  at  constant  speed  from  one  grid  point  to  a  neighbouring  grid  point, 
and  that  only  a  finite  number  of  speeds  are  permitted. 

4.1.1.  The  graph  and  objective  functions 

The  grid  points  form  the  vertices  of  a  graph  Q.  There  may  be  more  than  one 
edge  joining  a  pair  of  adjacent  vertices,  one  for  each  of  the  permitted  speeds  of  the 
object.  So  it  is  useful  to  denote  an  edge  by  an  ordered  triple  {v,v',z)  if  it  connects 
vertices  v  and  v'  and  the  object  is  travelling  z  units  per  second  between  them. 

Associated  with  each  edge  are  costs  corresponding  to  danger  or  the  probability  of 
detection  by  the  sensors  when  traversing  the  edge,  and  time  to  cover  that  distance. 
Both  depend  on  the  speed  of  the  object  and  the  grid  size.  The  first  cost  function 
fi{v,v',z)  simply  measures  the  time  taken  to  traverse  the  edge  {v,v',z). 

To  measure  the  danger,  we  need  to  know  the  probability  in  a  fixed  time  period  of 
detection  at  each  point.  To  simplify  things,  we  assume  that  the  probability  Pm(»'5  z) 
of  detection  in  the  fixed  time  period  by  the  sensor  depends  only  on  the  distance 
r  of  the  object  from  the  sensor  and  on  its  speed  z,  that  it  is  independent  of  the 
probability  of  detection  at  any  other  sensor,  and  does  not  change  with  time.  In 
practice,  values  of  Pm  (r,  z)  are  computed  for  a  discrete  set  of  distances  and  speeds 
using  a  complicated  model  that  may  also  take  into  account  such  things  as  type  and 
depth  of  sensor,  type  of  submarine  and  local  topography. 

Under  these  assumptions,  the  probability  of  not  being  detected  at  position  x  in 
the  fixed  time  period  by  one  of  M  sensors  is  A’(x, 2)  =  ~  PmiTm^z)), 

where  is  the  object’s  distance  from  the  sensor.  We  can  use  any  one  of  a 
number  of  methods  to  determine  an  average  probability  of  non-detection 

in  the  fixed  time  period  when  travelling  along  the  edge  {v,v',z),  and  from  this  we 
can  calculate  an  approximate  probability  of  non-detection  whilst  on  {v,v',z).  The 
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probability  of  non-detection  on  a  path  p  is  then  the  product  of  the  probabilities 
of  non-detection  on  the  edges  that  make  up  p.  Since  we  want  an  additive  cost 
function,  we  take  logarithms.  So  the  second  cost  function,  which  we  have  loosely 
called  the  danger,  can  be  defined  as  f2(y,v',z)  =  /i(u,u',2:)  |ln  (l  - 
Since  1  —  ^  [0;  1]  for  each  edge,  we  minimise  the  probability  of  detection 

by  minimising  f^- 

4.2.  Solving  the  discrete  problem 

To  find  the  Pareto-optimal  path  p*  that  satisfies  a  time  constraint  and  gives  the 
lowest  probability  of  detection,  we  use  a  version  of  the  algorithm  that  incorporates 
the  modifications  described  above.  We  assume  that  r,  the  maximum  time  allowed  to 
move  from  the  start  to  the  goal  location,  is  achievable  but  less  than  the  minimum 
time  required  for  the  least  dangerous  paths  from  s  to  t;  that  is,  91(5)  <  r  < 
min{ci(p)  ;  C2{p)  =  92(5)}- 

We  add  a  step  to  the  Pareto-algorithm  which  enables  us  to  choose  a  selection 
function  Sa  —  S(a,i-a)  that  leads  to  the  desired  solution  more  quickly.  As  a  guide 
to  how  to  do  this,  we  observe  that  ci(pa)  decreases  as  a  increases.  So  we  may 
do  a  binary  search  on  the  parameter  a,  using  Dijkstra’s  algorithm,  to  identify  an 
approximately  optimal  value  of  a,  in  the  sense  that  the  corresponding  path  pa  has 
time  cost  at  most  r  and  lowest  possible  danger  amongst  all  those  paths  constructed. 
This  value  of  a  determines  the  selection  function  Sa  used  in  the  Pareto  algorithm. 

4.3.  Experimental  results 

Efficiency  of  computation  may  be  improved  by  adopting  a  heirachical  or  multi¬ 
resolution  approach.  An  approximate  path  p^^^  is  obtained  using  a  course  grid  on 
the  transit  region.  This  grid  is  then  restricted  to  a  subregion  that  includes 
and  refined.  In  this  way,  a  sequence  of  approximate  paths  is  determined. 

As  n  increases,  the  submarine  is  given  increased  flexibility  to  change  direction  and 
speed.  Efficiency  can  also  be  improved  using  some  of  the  more  sophisticated  data 
management  techniques  available  when  coding  the  algorithm. 

The  attached  figure  shows  approximately  optimal  paths  obtained  for  one  sensor 
geometry  with  different  time  constraints,  under  the  assumption  that  the  probability 
model  for  each  sensor  is  the  same  and  that  only  two  speeds  were  possible.  Both 
paths  are  shown  on  the  danger-map  corresponding  to  the  lower  speed,  and  the 
darker  sections  of  each  path  correspond  to  higher  speeds.  A  heirachical  approach 
was  used  on  a  transit  region  of  62km  x  62km,  starting  with  a  grid  with  spacing 
3.26km  and  ending  after  the  second  iteration  with  a  grid  with  spacing  1.63km.  The 
longest  CPU  time  for  path  construction  was  less  than  1  second. 

This  technique  can  be  used  to  deal  with  situations  in  which  three  dimensional 
motion  is  allowed,  and  directionally  biassed  sensors  used.  However,  the  appropriate 
probabilities  must  be  available  and  there  will  be  a  heavier  computational  burden. 
We  could  also  add  additional  constraints  (pruning  criteria)  to  allow  for  such  require¬ 
ments  as  the  need  for  a  diesel  submarine  to  recharge  its  batteries  near  the  surface, 
thus  increasing  its  probability  of  detection,  after  extended  periods  submerged. 
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Figure  2:  Approximate  safe  paths  with  time  contraints  of  (1)  500min  (ii)  428min 
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We  outline  the  basic  framework  of  quantum  computing,  and  describe 
some  key  quantum  algorithms  and  their  applications.  We  also  identify 
the  crucial  role  of  the  Fourier  transform  in  some  of  these  algorithms.  It 
is  reasonable  to  expect  that  new,  more  powerful  cdgorithms  can  be  found 
which  further  exploit  the  Fourier  transform  in  its  various  forms. 


1.  ORIGINS 

Simulating  quantum  mechanical  processes  on  conventional  computers  is  gener¬ 
ally  computationally  infeasible.  Such  a  simulation  typically  involves  an  exponential 
slowdown  in  time  compared  to  the  evolution  itself,  since  the  amount  of  information 
needed  to  describe  the  evolving  quantum  system  in  classical  terms  grows  expo¬ 
nentially  with  time.  However  in  1981  Feynman  [7]  suggested  that  it  ought  to  be 
possible  to  turn  this  around  and  treat  it  as  opportunity  rather  than  an  obstacle. 
He  argued  that  by  regarding  the  measurements  obtained  from  experiments  carried 
out  on  certain  types  of  quantum  mechanical  devices  as  the  results  of  complex  com¬ 
putations,  it  might  be  possible  to  perform  certain  computational  tasks  beyond  the 
reach  of  any  conceivable  conventional  computer. 

This  vision  of  Feynman  has  stimulated  research  in  various  directions,  including, 
of  course,  the  search  for  ways  of  building  devices  that  can  function  usefully  as 
quantum  computers.  But  is  this  really  worth  doing?  Would  quantum  computers 
be  significantly  more  powerful  than  conventional  computers?  Benioff  [1]  showed  as 
early  as  1980  that  any  computation  that  could  be  done  by  a  conventional  computer 
could,  in  principle,  also  be  done  by  a  quantum  computer.  Within  the  next  ten  or 
so  years  formal  models  of  quantum  computing  had  been  developed  and  a  number 
of  contrived  problems  that  could  be  solved  more  efficiently  on  quantum  computers 
had  been  discovered  [6],  [2]  [12]. 

However  the  first  real  breakthrough  came  in  1994  when  Shor  [13]  presented  an 
efficient  quantum  algorithm  for  factorization.  This  generated  a  great  deal  of  ex¬ 
citement  because  no  such  conventional  algorithm  is  known.  It  is  widely  suspected 
(but  not  proven)  that  none  exists  and  that  factorization  is  NP-hard.  Indeed  the 
very  diflBculty  of  factorizing  large  numbers  is  the  key  to  the  security  of  many  com¬ 
mercially  used  encryption  schemes.  These  will  be  rendered  useless,  of  course,  if  the 
implementation  of  Shor’s  algorithm  on  quantum  computers  becomes  a  reality. 
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The  next  big  step  was  taken  in  1996  by  Grover  [8],  [9],  who  presented  a  quantum 
search  algorithm  which  gave  a  quadratic  improvement  over  existing  search  meth¬ 
ods.  He  has  since  adapted  the  key  idea  of  his  search  algorithm  to  develop  efficient 
quantum  algorithms  for  solving  problems  not  immediately  related  to  searching  [10]. 

In  recent  years  efforts  have  been  made  to  unify  known  quantum  algorithms  within 
various  frameworks.  For  example,  in  1995  Kitaev  [11]  recognized  the  crucial  role 
of  the  Fourier  transform  of  various  finite  Abelian  groups,  and  exploited  this  in 
developing  other  algorithms  for  solving  group-related  problems.  On  the  other  hand, 
in  a  very  recent  paper  [3]  interesting  and  potentially  powerful  links  were  made 
between  quantum  algorithms  and  multi-particle  interferometer  experiments. 


2.  MODELS  FOR  COMPUTATION 

All  models  of  ‘reasonable’  conventional  computers  are  equivalent,  in  the  sense 
that  any  universal  machine  can  simulate  any  other  machine  with  at  most  a  ‘poly¬ 
nomial’  slowdown.  It  suits  our  purpose  here  to  describe  conventional  computers 
as  devices  that  store  binary  digits  (bits)  in  locations  known  as  registers.  These 
bits  can  be  read  (measured)  at  any  time  and  they  can  be  used  as  the  inputs  for 
Boolean  functions.  The  outputs  of  these  functions  can  then  be  stored  in  the  reg¬ 
isters,  overwriting  the  existing  values  if  necessary.  In  this  setting  a  conventional 
algorithm  is  merely  a  sequence  of  Boolean  functions,  together  with  the  appropriate 
bookkeeping  that  tracks  the  locations  of  the  inputs  and  outputs  of  these  functions. 
These  functions  are  usually  Boolean  gates  which  act  on  just  a  few  bits  at  a  time. 
Such  an  algorithm  can  be  represented  graphically  as  a  Boolean  network  or  circuit. 
We  apply  an  algorithm  by  prescribing  the  initial  state  of  the  registers  (i.e.  a  binary 
string  describing  the  values  of  the  stored  bits),  and  then  applying  the  functions 
in  the  prescribed  order.  The  output  of  the  algorithm  is  just  the  final  state  of  the 
registers. 

Many  commonly  used  Boolean  functions  (such  as  AND)  are  not  1  —  1.  So  the 
algorithms  described  above  are  not  reversible,  in  the  sense  that  we  cannot  always 
recover  the  initial  state  from  the  final  one.  However  by  including  extra  bits  (in 
registers  commonly  called  scratch  pads),  we  can  assume  that  we  are  working  with 
1  —  1  functions  and  our  algorithms  are  reversible.  To  see  this,  let  denote  the 
set  of  all  Boolean  n-strings  x  =  X1X2X3  . . .  x„,  where  Xi  &  B  =  {0, 1}  for  each  i,  and 
suppose  that  /  :  ->■  Then  the  function  F  :  defined 

by  F{x,y)  —  (x,?/©/(x)),  where  ©  denotes  the  bitwise  exciwswe  or  (XOR),  isl-1. 
Since  the  evolution  of  isolated  quantum  systems  is  reversible,  models  of  reversible 
computing  are  more  adaptable  to  quantum  computation. 

Whereas  the  fundamental  unit  of  conventional  information  is  the  hit,  the  corre¬ 
sponding  unit  of  quantum  information  is  the  quantum  bit  or  qubit.  The  quantum 
state  of  single  qubit  system  is  a  unit  vector  j^)  in  a  two-dimensional  complex  inner 
product  space,  which  we  denote  by  B  or  B^^\  The  space  B  has  a  distinguished 
orthonormal  basis  whose  elements  are  labelled  |0)  and  ]1),  and  so  the  state  \tp)  of 
the  qubit  can  be  expressed  as  a  linear  combination 


\xp)  =  a  |0)  -I-  6 ]1) ,  where  a,b£  C  and  \af'  4-  \hf'  =  1.  (1) 
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A  measurement  of  lip)  can  be  regarded  as  a  projection  onto  one  or  other  of  the 
basis  vectors  |0}  or  |1}.  The  outcome  is  not  deterministic.  In  fact  we  obtain  the 
result  |0)  with  probability  |ap  and  |1)  with  probability  |6|^. 

The  state  of  an  n-qubit  quantum  system  is  a  unit  vector  |^)  in  a  complex  inner 
product  space  of  dimension  2”,  which  we  denote  by  This  can  be  regarded  as 
a  tensor  product  of  n  copies  of  B.  The  distinguished  basis  of  the  quantum  state 
space  B^'^^  consists  of  all  states  in  which  each  qubit  has  a  definite  value,  either  |0) 
or  |1).  These  states  are  known  as  the  classical  states  of  the  system,  and  can  be 
labelled  as  binary  n-strings  |x)  =  |xiX2  . .  -Xn),  where  each  xj  €  B.  Thus  if  5^"^ 
denotes  the  set  of  all  binary  n-strings  we  can  express  |^)  in  the  form 

IV’>=  ax  |x) ,  where  each  a*  €  C  and  E  K|'  =  l.  (2) 

agBt")  xgBf") 

If  we  measure  all  n  qubits  of  the  system  in  parallel,  the  state  of  the  system  becomes 
one  of  the  classical  states,  and  the  probability  of  obtaining  any  one  such  state  [x) 
is  \ax\  . 

A  quantum  algorithm  can  be  described  in  the  following  simple  manner.  We  start 
with  the  n  qubits  in  a  classical  initial  state  such  as  |000 ...  0)  and  then  apply  a 
unitary  transformation.  This  is  usually  a  product  of  standard  quantum  gates  that 
act  on  just  a  few  qubits  at  a  time.  The  output  of  the  computation  is  then  obtained 
by  measuring  some  or  all  of  the  qubits. 

The  probabilistic  nature  of  the  output  of  a  quantum  algorithm  is  an  important 
difference  from  conventional  computing.  Unless  the  final  state  is  one  of  the  classical 
states  rather  than  a  superposition,  repetitions  of  a  quantum  algorithm  can  produce 
different  results. 

Conventional  computers  can  store  and  rotate  vectors,  and  can  simulate  the  quan¬ 
tum  measuring  process  of  projecting  onto  mutually  orthogonal  axes.  So  conven¬ 
tional  computers  can  do  anything  quantum  computers  can  do.  The  difference  is  in 
the  speed  and  storage  requirements.  For  example,  merely  to  represent  on  a  con¬ 
ventional  computer  a  typical  state  \ip)  of  an  n-qubit  quantum  system,  we  need  to 
store  the  2”  coefficients  cis  in  (2). 

3.  ALGORITHMS 

The  first  quantum  algorithms  were  given  by  Deutsch  [4]  [5]  [6],  and  were  designed 
to  determine  whether  a  given  Boolean  function  possesses  certain  global  properties, 
(i.e.  joint  properties  of  all  the  function  values).  The  aim  is  to  use  a  minimum  num¬ 
ber  of  function  evaluations.  By  concentrating  on  global  properties  the  algorithms 
are  attempting  to  exploit  the  parallelism  inherent  in  quantum  mechanics. 

3.0.1.  Deutsch’ s  algorithm 

The  algorithm  now  known  as  Deutsch ’s  algorithm  has  evolved  since  its  first  ap¬ 
pearance  in  1985.  In  its  most  recent  and  powerful  form  we  are  given  a  function 
/  :  B,  and  are  told  that  /  is  either  constant  or  balanced,  that  is,  the  values 

of  /  are  either  all  the  same  or  there  are  equally  many  zeroes  and  ones,  (2"~^  of 
each,  in  fact).  The  problem  is  to  decide  whether  /  is  constant  or  balanced.  Any 
conventional  algorithm  may  require  as  many  as  2"“^  +  1  function  evaluations,  but 
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Deutsch’s  algorithm  requires  only  one  application  of  Uf ,  the  quantum  version  of  / , 
and  0(n)  applications  of  standard  quantum  gates  and  register  measurements. 

The  advantage  of  this  quantum  algorithm  over  conventional  methods  is  lost,  how¬ 
ever,  if  we  allow  errors  into  our  computations.  If  we  are  satisfied  to  guess  whether 
/  is  balanced  or  constant  on  the  basis  of  a  number  of  function  evaluations,  then  we 
can  guess  correctly  with  probability  \  —  e  using  0(log(l/£))  evaluations.  Since  we 
can  have  an  exponentially  good  probability  of  success  with  just  polynomially  many 
trials,  the  problem  cannot  really  be  regarded  as  hard. 

3.0.2.  Simon’s  algorithm 

In  1993  Bernstein  and  Vazarini  [2]  gave  a  variation  of  Deutsch’s  problem  which 
is  hard  conventionally  and  which  showed  for  the  first  time  that  quantum  compu¬ 
tation  is  significantly  more  powerful  than  conventional  computation.  Simon  [12] 
soon  followed  with  a  simpler  example.  Simon’s  algorithm  has  turned  out  to  be 
quite  significant  since  it  established  a  pattern  that  has  been  subsequently  used  and 
generalized. 

In  Simon’s  problem  we  are  given  a  Boolean  function  /  :  -y  B*”)  which 

we  are  told  is  2  -  1  and  has  unknown  period  i.e.  f{x)  —  f{y)  if  and  only  if 
y  =  x  ©  f ,  x,y  e  B^").  The  problem  is  to  determine  the  period  Since  /  has 
2”“^  distinct  values,  if  we  try  to  solve  the  problem  simply  by  searching  through 
the  values  of  /  we  may  need  as  many  as  2"^^  +  1  values  before  we  find  a  match 
(and  hence  the  period  Q.  A  similar  exponential  number  of  evaluations  is  needed 
in  order  to  guarantee  any  given  probability  of  success  if  the  observations  are  noisy. 
However  the  quantum  algorithm  presented  by  Simon  achieves  a  given  probability 
of  success  in  determining  ^  with  0{n)  observations,  even  in  the  presence  of  noise. 

3.0.3.  Shor’s  algorithm 

Shor’s  algorithm  [13]  is  a  method  for  factorizing  a  given  positive  integer  N.  It 
does  this  by  solving  an  equivalent  problem,  that  of  finding  the  order  of  any  number 
y  coprime  to  N.  (This  is  the  least  positive  integer  r  for  which  y’’  =  I  modiV.)  The 
algorithm  has  essentially  the  same  formal  structure  as  Simon’s  algorithm,  but  it 
uses  a  quantum  version  of  the  discrete  Fourier  transform  (DFT).  A  key  ingredient  of 
Shor’s  algorithm  is  the  use  of  an  efficient  quantum  circuit  for  evaluating  the  DFT 
for  the  additive  group  of  integers  modA^.  The  quantum  Fourier  transform 
(QFT)  for  2jv  requires  0(logiV)2)  g^gpg^  is  an  exponential  improvement  on 

both  the  standard  method  [0{N‘^))  and  the  fast  Fourier  transform  (C>(iV  log iV)). 

3.0.4.  Grover’s  algorithm 

In  its  original  form  Grover’s  algorithm  gave  a  method  of  speeding  the  identifica¬ 
tion  of  a  particular  object  in  a  large  data  base.  If  the  data  base  has  size  N,  then 
it  takes  NI2  look-ups  to  be  guaranteed  a  50%  chance  of  finding  the  one  we  want. 
Grover  [8]  presented  a  quantum  algorithm  which  is  almost  certain  to  succeed  after 
0{VN)  ‘observations’. 

In  the  quantum  setting  of  Grover’s  algorithm,  the  unknown  object  is  represented 
as  a  ‘marked’  classical  state  uj  in  B^”^,  where  N  =  2".  The  algorithm  starts  by 
preparing  the  balanced  superposition  s  =  |x).  If  we  measure  the 

system  now,  the  chance  of  obtaining  uj  is  merely  1/iV.  The  key  to  the  algorithm 
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is  a  unitary  operator  U  which  rotates  this  state  towards  w.  This  operator  U  is 
the  product  of  two  reflections.  The  first  of  these  is  the  quantum  equivalent  of 
the  characteristic  function  of  w,  and  the  second  reflection  is  associated  with  the 
superposition  s.  After  n\/NIA  applications  of  U,  the  state  of  the  system  is  very 
close  to  u,  and  so  a  measurement  will  now  almost  certainly  produce  u). 

Since  Grover’s  algorithm  gives  a  quadratic  improvement  over  conventional  search 
methods,  the  question  naturally  arises  as  to  whether  quantum  computers  have  the 
potential  to  do  even  better.  In  other  words,  is  there  a  quantum  search  algorithm 
that  requires  signiflcantly  less  than  ■wy/N /4  observations?  It  turns  out  that  no  such 
algorithm  exists.  In  order  to  distinguish  all  possible  values  of  the  marked  state,  it 
is  necessary  to  have  at  least  y/Nj2  observations  [14].  Grover’s  algorithm  exceeds 
this  bound  by  less  than  12%. 

This  algorithm  has  since  been  adapted  to  develop  efficient  quantum  algorithms 
for  estimating  means  and  standard  deviations  of  statistical  distributions  [10]. 

4.  FOURIER  TRANSFORMS 

It  has  recently  been  recognized  [3], [11]  that  all  of  these  key  quantum  algorithms, 
except  for  Grover’s,  solve  problems  that  can  be  expressed  in  group  theoretic  terms 
as  examples  of  finding  ‘hidden  subgroups’  of  a  finite  Abelian  group.  In  each  case 
the  algorithm  begins  by  preparing  a  superposition  of  states  using  the  quantum 
version  of  the  appropriate  Fourier  transform.  For  the  algorithms  of  Deutsch  and 
Simon,  the  underlying  group  is  and  the  Fourier  transform  is  a  Walsh-Hadamard 
transformation.  For  Shor’s  cilgorithm  is  the  underlying  group,  and  in  this  case 
we  use  the  quantum  equivalent  of  the  standard  discrete  Fourier  transform.  The 
point  of  the  Fourier  transform  is  that  it  useful  in  recognizing  periodicity.  The 
role  of  periodicity  is  plainly  visible  in  Shor’s  problem,  but  it  has  also  been  shown 
[3]  to  be  present  in  disguised  forms  in  the  problems  of  Deutsch  and  Simon.  In 
[11]  Kitaev  devised  new  quantum  algorithms  for  computing  stabilizers  of  actions  of 
finite,  Abelian  groups.  These  too  were  based  on  quantum  versions  of  the  appropriate 
Fourier  transform,  and  for  which  he  also  gave  efficient  algorithms. 

As  noted  in  [3],  we  can  expect  to  find  other  problems  associated  with  the  subgroup 
structure  of  groups  that  turn  out  to  be  amenable  to  efficient  quantum  computation. 

REFERENCES 

1.  BeniofF,  P.  Quantum  mechanical  Hamiltonian  models  of  Turing  machines,  J.  Stat.  Phys.  29 
(1982)  515. 

2.  Bernstein  E.  &  Vazirani  U.  Quantum  complexity  theory,  Proc.  25th  ACM  Symp.  on  the  Theory 
of  Computing  (1993)  11-20. 

3.  Cleve  R.,  Ekert  A,  Henderson  L.,  Macchiavello  C.,  &  Mosca  M.  On  quantum  algorithms,  (1999) 
lanl  e-print  990361 

4.  Deutsch,  D..  Quantum  theory,  the  Church-Turing  principle  and  the  universal  quantum  com¬ 
puter,  Proc.  Roy.  Soc.  London  Ser.  A,  400  (1985)  97. 

5.  Deutsch,  D..  Quantum  computational  networks,  Proc.  Roy.  Soc.  London  Ser.  A,  425  (1989) 
73. 

6.  Deutsch,  D.  &  Jozsa,  R.  Rapid  solution  of  problems  by  quantum  computation,  Proc.  Roy.  Soc. 
London  Ser.  A,  439  (1992)  553-558. 

7.  Feynman,  R.P.  Simulating  physics  with  computers,  Int.  J  of  Theoretical  Physics,  21  (1982), 
467-488. 


93 


6 


8.  Grover  L.K.  A  fast  quantum  mechanical  algorithm  for  database  search,  Proc.  28th  Annual 
Symposium  on  the  Theory  of  Computing,  (1996)  212-219. 

9.  Grover  L.K.  Quantum  mechanics  helps  in  searching  for  a  needle  in  a  haystack,  Phys.  Rev. 
Letters  76(2),  (1996)  325-328. 

10.  Grover  L.K.  A  framework  for  fast  quantum  mechanical  algorithms,  Proc.  30th  Annual  Symp. 
on  the  Theory  of  Computing  (1998),  lanl  e-print  9711043. 

11.  Kitaev,  A.Yu.  Quantum  measurements  and  the  abelian  stabilizer  problem,  lanl  e-print  9511026. 

12.  Simon,  D.  On  the  power  of  quantum  computation,  Proc.  35th  Annual  Symp.  on  the  Funda¬ 
mentals  of  Computer  Science  (1994),  116-123. 

13.  Shor.  P.W.  Algorithms  for  quantum  computation;  discrete  log  and  factoring.  Proceedings  of 
the  35th  Annual  Symp.  on  the  Fundamentals  of  Computer  Science  (1994)  124. 

14.  Zalka  C.,  Grover’s  Quantum  Searching  Algorithm  is  Optimal,  lanl  e-print  9711070 


94 


Object  Enhancement  in  Time-Frequency  Scans  of 
Communications  Environments 
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In  this  work  we  apply  a  nonlinear  morphological  filter  with  a  one-dimensional 
structure  element  to  the  task  of  improving  the  "visibility”  of  several  signals  of  interest 
(SOI).  Two  dimensional  time-frequency  scans  representing  a  series  of  wideband 
snapshots  of  a  communications  environment  are  analysed.  The  morphological  filter 
approach  is  compared  with  conventional  thresholding  and  with  no  filtering.  It  is 
found  that  the  morphological  filter  can  improve  the  SOI  visibility  at  specified  levels  of 
environment  noise.  Signals  considered  include  frequency  hopping  and  time  co¬ 
incident  multi-tone  transmissions. 


Key  Words:  Morphological  filters;  time-frequency  scans 

0.  INTRODUCTION 

The  analysis  of  how  spectral  information  varies  over  time  has  been  the  source  of  much  research  in  recent 
times  (see  [1]  for  a  recent  review).  Much  of  the  work  has  been  aimed  at  developing  transforms  which  may 
be  applied  to  time  domain  signals  and  thereby  produce  a  set  of  characteristic  features  in  the  time-frequency 
domain.  However,  in  wideband  scans  of  communications  environments,  signals  of  interest  are  often 
represented  as  an  increase  in  energy  in  a  single  frequency  bin  in  the  spectral  FFT  “snapshot”.  This  coarse 
resolution,  necessary  to  capture  wideband  environments,  means  that  employing  conventional  transforms 
provides  little  benefit  in  detecting  the  presence  of  a  SOI  when  compared  with  the  observation  of  the  level 
of  spectral  energy  at  specific  frequencies  and  times. 

Representation  of  these  wideband  scans  as  a  two  dimensional  rasterised  image  presents  visual  observers 
with,  essentially,  a  task  in  image  recognition.  The  task  becomes  one  of  detecting  the  presence  or  absence  of 
a  SOI  based  on  the  carrier  level  representation  observed  in  the  scan.  Further  observations  such  as  co¬ 
incident  transmissions  or  ttansmissions  related  in  time  or  frequency  may  also  further  characterise  a 
particular  SOI. 

In  this  work  we  have  applied  morphological  filter  techniques  to  the  task  of  improving  the  “visibility”  of 
several  signals  of  interest  present  in  two-dimensional  time-frequency  scans  of  a  communication 
environment. 


1.  MORPHOLOGICAL  FILTERS 

Morphological  filters  are  well  known  in  the  field  of  image  processing  [2,3,4].  Essentially,  they  are  a  class 
of  nonlinear  filters  capable  of  enhancing  the  features  and  boundaries  present  in  noise  contaminated  objects, 

As  noted  in  the  literature  [2]  a  nonlinear  closure  filter  can  be  defined  for  any  linear  space  (eg:  and  R^). 
Closing  of  a  set  X  by  B  is  defined  as 


(X@B)  Q  B  (1) 

Here,  X  represents  the  image  and  B  is  the  set  called  the  “structure  element”.  The  operators  ©  and  0  are 
respectively,  the  Minkowski  sum  and  difference  operators  [2].  The  operator  ©  is  also  called  the  dilation  of 
XbyB,ie: 
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X@B  =  {h\[(B')hnx]^0} 


(2) 


Here,  B’  is  the  reflection  of  set  B,  ie 


B'  =  {-b:beB}  (3) 

Dilation  of  X  by  B  is  thus  the  set  of  all  non-zero  intersections  between  set  X  and  all  h  translations  of  the 
reflection  of  the  structure  element,  B.  Erosion  of  X  by  B  may  be  defined  by 

X  0B  =  {/!i(S)AcX}  (4) 

Erosion  of  X  by  B  is  the  set  of  all  ft  translations  of  the  stracture  element  B  which  are  contained  within  set 
X.  We  can  see  from  (1)  therefore,  that  closure  can  be  thought  of  as  the  dilation  of  set  X  by  B  followed  by 
the  erosion  of  the  result  by  B. 

Usually,  the  structure  element  B,  is  a  sphere  (in  R^)  or  disc  (in  R^).  Hence,  applying  the  closure  filter  is 
analagous  to  rolling  or  sliding  the  sphere  or  disc  around  the  outside  of  the  object  for  dilation,  then  around 
the  inside  of  the  dilated  object  for  erosion,  thereby  altering  the  “morphology”  of  the  original  object. 

Closure  also  has  the  property  of  idenqrotence,  which  means  that  applying  the  closure  operation  more  than 
once  has  no  further  effect  on  the  result. 

2.  A  MORPHOLOGICAL  FILTER  FOR  NARROWBAND  AND  FREQUENCY  HOPPING 

SIGNALS 

In  the  case  of  two  dimensional  time-frequency  scans  of  a  coimnunications  enviromnent,  spectral  signatures 
of  various  emissions  can  be  viewed  as  individual  noise  contaminated  objects  which  vary  as  a  function  of 
time.  For  narrowband  emissions,  with  an  appropriate  selection  of  FFT  size,  the  object  can  be  contained 
entirely  within  one  “spectral  bin”.  When  this  is  done  there  is  no  (or  negligible)  correlation  between  adjacent 
frequency  bin  components  at  any  given  time  t.  Hence,  we  can  define  a  simplified  structure  element  as  the 
one  dimensional  column  vector  B,  of  dimension  M  x  1.  In  this  work,  M  is  defined  as  the  “window-length” 
of  the  morphological  filter. 

For  the  case  of  M  =  3  we  have; 
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(5) 


The  application  of  a  closure  filter  to  a  2D  time-frequency  scan  using  this  structure  element  involves  sliding 
B  along  M  adjacent  time  samples  in  the  scan  and  performing  a  dilation.  This  is  then  followed  by  a  scan 
with  an  erosion  operation,  as  defined  in  (4)  above,  again  using  B.  In  this  present  work  the  closure  operation 
on  the  scan  has  been  augmented  by  a  number  of  other  operations. 

Firstly,  a  data  reduction  step  is  carried  out  where  the  noise  floor  of  the  scan  is  estimated  and  then  an 
appropriate  threshold  level  set.  The  threshold  level  (T),  below  which  data  is  removed,  is  set  at 

T  =  mean(X)  +  2a  (6) 


Here  o  is  the  standard  deviation  of  the  distribution  of  backgroimd  noise  levels  in  the  scan.  The  effect  of 
thresholding  is  to  remove  much  of  the  low-level  background  noise  in  the  scan  and  leave  only  candidate 
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objects  of  level  greater  than  2  standard  deviations  above  the  noise  floor  mean.  Following  the  data  reduction 
by  thresholding,  a  modified  dilation  operation  is  applied. 

In  this  work  the  dilation  is  proceeded  by  setting 

(X  n  BY  =  max(X  n  B)  (7) 

for  each  h  translation  of  B  applied  to  X.  This  effectively  sets  each  element  of  the  intersection  equal  to  the 
maximum  in  that  intersection.  The  dilation  is  then  applied  to  the  (X  n  B)"  term.  The  purpose  of  this 
operation  is  to  reduce  the  effect  of  short  duration  fades  in  any  narrowband  object  present  in  the  image. 

During  the  erosion  phase,  the  application  of  B  with  M=3  effectively  removes  objects  of  less  than  three  time 
divisions  duration.  This  “decluttering”  of  the  image  is  designed  to  remove  short  duration  spurious 
interferers  and  higher  level  background  iiiq)ulsive  noise  present  in  the  scan. 

3.  THE  COMMUNICATIONS  SYSTEM 

Simulated  communication  scans  of  different  emission  types  have  been  generated  and  applied  to  one  of 
three  filters.  The  emission  types  considered  were: 

1.  A  single  Narrowband  tone.  The  tone  (Asin(2rtfc  t))  commences  at  time  ti  and  ends  at  time  ta.  Here, 
0<  {ti,t2}  <  T  where  T  is  the  time  of  the  final  spectral  estimation  of  the  environment.  The  fc  term  is 
the  carrier  frequency  of  the  emission; 

2.  Several  adjacent  narrowband  tones,  co-incident  in  time.  Here,  the  tones  are  defined  by  Asin(27c(fc 
+nfa)t)  where  fd  is  the  separation  between  adjacent  tones  and  n  is  an  element  of  {n:  Ini  = 

0,1, 2,3,4,...}.  Once  again  the  tones  start  at  tj  and  tj  where  0<  {ti,t2}  <T  as  above; 

3.  Frequency  hopping  narrowband  emissions.  A  series  of  tones  with  randomly  changing  frequency 
steps  were  generated. 

Each  of  these  emissions  was  contaminated  with  Additive  White  Gaussian  Noise  (AWGN)  of  varying  levels 
and  then  filtered  using  one  of  the  filters  described  below. 

Scans  of  the  received  signals  were  constructed  by  taking  successive  spectra  of  the  time  domain  signal  and 
assembling  the  spectra  into  a  two  dimensional  (time-frequency)  file.  In  this  case  the  files  were  Nf  =  512 
frequency  bins  wide  by  Nt  =  155  successive  spectral  samples. 

The  filters  con^ared  in  this  work  were: 

1 .  No  filter.  Here,  the  noise  contaminated  scan  is  passed  through  unchanged; 

2.  Threshold  Filter.  The  noise  floor  is  estimated  and  a  threshold  level  set  as  in  (6)  above.  Scan 
data  above  the  threshold  level  is  passed  unchanged  whereas  data  below  the  threshold  is 
filtered  out; 

3.  Modified  Moiphological  Closure  Filter.  A  modified  form  of  the  morphological  filter 
implementing  the  closure  operation  with  a  structure  element  as  in  (5)  was  implemented.  The 
dimension  of  the  structure  element  (M)  was  able  to  be  varied. 

The  “quality”  of  the  filtered  scans  was  then  estimated  and  compared  over  different  noise  levels  and  with 
different  filter  types  used  in  the  filtering  stage.  The  performance  of  the  filters  was  measured  in  several 
ways. 

Firstly,  the  filtered  scan  (Xf)  was  conq)ared  with  a  scan  of  the  original  transmission  (S).  The  siun  squared 
error  between  the  two  scans  was  calculated  at  differing  noise  levels. 
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SquaredError  = 


-  I 

^f  =  l 


x.-s. 
ft  t 


(8) 


Here,  N  =  Nf  x  Nt,  the  total  number  of  data  elements  in  each  scan  and  {XieX},{SieS}. 


Secondly,  an  attempt  has  been  made  to  estimate  the  “visibility”  of  the  signal  of  interest  (SOI)  in  the  filtered 
scan.  Here,  visibility  is  also  relevant  as  it  indicates  the  ease  with  which  specific  SOI  could  be  detected  by 
visual  inspection  of  a  rasterised  representation  of  the  2D  time-frequency  scan.  The  visibility  of  the  SOI 
(Vsoi)  can  be  estimated  by  calculating  the  ratio  of  the  SOI  average  intensity  after  filtering,  mean(SOIo),  to 
the  level  of  the  background  noise  floor  after  filtering.  This  can  be  expressed  as; 


VgQj  -  mean(SOIo)lmeaniX  f )  (9) 

4.  RESULTS 

The  squared  error  obtained  when  each  of  the  filters  is  used  with  received  scans  at  different  noise  levels  is 
shown  graphically  in  Figure  1  for  the  case  of  the  Frequency  Hopping  SOI.  The  noise  level  is  measured  as 
the  Carrier  to  Noise  ratio  (CNR)  in  dB.  From  this  graph  it  can  be  seen  that,  as  expected,  the  threshold  filter 
substantially  reduces  the  square  error  conpared  with  no  filter  as  the  noise  level  increases.  The  further 
reduction  in  squared  error  obtained  when  the  morphological  filter  is  used  is  due  to  the  filtering  of  all  clutter 
less  than  M=3  raster  periods  in  length. 

SOI  visibility  for  M  =  2  and  M  =  3  is  depicted  in  Figures  2  and  3  respectively.  In  Figure  3  it  can  be  seen 
that  the  morphological  filter  with  M  =  3  gives  very  high  visibility  at  lower  background  noise  floor  levels. 
However,  as  the  noise  increases,  the  morphological  filter  performance  degrades  giving,  at  best,  a  4dB 
inqjrovement  in  CNR  over  threshold  filtering.  The  degradation  in  the  morphological  filter  performance  is 
due  to  the  presence  of  large  bursts  of  noise  and  the  increasing  number  of  fades  in  the  SOI.  The  filter  will 
remove  fragments  of  the  SOI  when  the  fades  produce  excessive  segmentation. 

5.  CONCLUSION 

The  modified  morphological  filter  developed  in  this  work  has  been  shown  to  reduce  the  error  in  received 
time-frequency  scans  of  an  AWGN  channel  with  several  different  SOI.  For  a  given  value  of  background 
noise,  the  morphological  filter  is  seen  to  perform  better  than  conventional  thresholding  of  the  input  scan 
and  also  when  no  filtering  is  used. 

For  the  signals  tested,  the  M=3  morphological  filter  has  been  shown  to  increase  the  SOIo/noise  floor  ratio 
providing  approximately  a  4dB  gain  in  CNR  over  threshold  filtering.  This  would  effectively  extend  the 
detectable  range  of  a  SOI  to  an  automated  or  manual  detection  system. 

A  potential  application  of  this  filter  includes  that  of  a  “front-end”  for  a  signal  classifier  system.  The  ability 
of  the  filter  to  increase  the  visibility  of  frequency  hopping  signals  could  also  be  developed. 
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FIG.  1.  Squared  Enor  vs.  CNR  (SOI  =  FH) 
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FIG.  3.  SOI  Visibility  -  FH  -  Morph  Window  Length  =  3 


100 


A  Speech  Segmentation  Algorithm  with  Application  to 

Speaker  Identification. 

By:  O.P.  Kenny,  R.C.  Price,  and  J.  Willmore. 


Radar  Target  Identification  Group,  and  Information  Management  and  Fusion  Group, 
Defence  Science  and  Technology  Organisation,  P.O.  Box  1500,  Salisbury,  5108, 

Australia. 


ABSTRACT. 

Speaker  identification  has  become  an  increasingly  useful  tool  in  areas  such  as  forensics  and  for  military 
applications.  Commonly  used  speaker  identification  systems  are  based  on  statistical  based  models  to  represent 
speech  [1].  A  speaker  is  chosen  by  its  likelihood  function  for  a  given  statistical  model.  In  practical  situations,  it  is 
common  for  more  than  one  speaker  to  be  present. 

The  focus  of  this  paper  is  to  determine  if  there  is  an  advantage  in  segmenting  speakers  before  forming  the  log- 
likelihood  scores  for  test  speech  utterances.  Our  aim  is  to  segment  the  speakers  without  adversely  increasing  the 
processing  required  for  speaker  identification.  Consequently,  this  constraint  was  reflected  in  the  implementation 
of  the  segmentation  algorithm. 

Experimental  results,  based  on  marked  data,  have  shown  the  performance  of  the  segmentation  algorithm  varies 
depending  on  the  set  of  speakers.  It  has  been  found  that  segmentation  performance  is  estimated  to  be  between  70 
and  95  percent.  An  improvement  to  the  performance  of  speaker  identification  was  also  observed,  after 
segmentation  had  been  performed. 

1.0  Introduction. 

Speaker  identification  has  been  the  subject  of  research  from  the  mid  seventies  through  to  the  present  day.  Early 
speaker  identification  approaches  used  long-term  averages  of  acoustic  features  such  as  the  pitch  [2],  [3].  Long 
term  averaging  smoothed  out  phonetic  variations  leaving  the  speaker’s  vocal  tract  shape.  This  approach  requires 
integration  over  long  periods  of  speech,  of  the  order  of  20s.  Other  approaches  compared  individual  phonetic 
sounds  that  compose  the  utterance  [4],  [5].  This  comparison  gave  a  measure  of  the  distance  from  one  speaker  to 
another  rather  than  the  textual  information.  These  methods  used  hidden  Markov  models  for  phonetic  structure 
detection.  Another  commonly  used  approach  is  based  on  neural  networks  [6],  [7].  This  uses  a  closed  set  of 
speaker  training  data  to  form  decision  boundaries  for  the  speakers. 

We  used  two  speaker  identification  approaches  a  statistical  model  and  a  neural  net-based  approach  [8].  The 
statistical  model  system  uses  a  Gaussian  mixture  model  to  represent  the  statistical  information  for  the  speaker’s 
speech.  Training  data  is  used  to  form  the  parameter  set  of  the  model.  Speaker  identification  is  then  achieved  by 
obtaining  log-likelihood  scores  for  test  utterances  for  a  given  speaker  model.  The  speaker  chosen  is  the  one  that 
gives  the  maximum  log-likelihood  score. 

The  focus  of  this  paper  is  to  find  a  computationally  efficient  algorithm  to  separate  a  set  of  speakers  before 
identification.  The  speakers  are  segmented  into  two  groups  and  used  to  form  log-likelihood  scores  against  each 
speaker  model.  The  identified  speaker  is  chosen  to  be  the  maximum  in  the  class  and  the  furthest  away  from  the 
background  speakers. 

Speaker  separation  has  appeared  in  the  literature.  Chen,  Brown,  and  Bovey  [9]  developed  an  algorithm  for  speech 
segmentation  for  the  work  environment.  This  algorithm  used  pre-captured  speech  samples  of  potential  speakers 
and  found  use  as  an  audio  indexing  system. 
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Cohen  and  Lapidus  [10]  investigated  the  problem  of  unsupervised  speech  segmentation  in  telephone  conversions. 
In  this  case,  no  a  priori  speaker  information  was  assumed.  This  algorithm  accepted  dual  telephone  speech  data, 
detected  events  of  simultaneous  speakers,  and  then  segmented  the  speech  assigning  each  to  a  group.  Takagi  and 
Itahashi  [11]  clustered  Japanese  utterances  using  seven  prosodic  feature  vectors.  Sugiyama,  Murakami  and 
Watanabe  [12]  investigated  speech  segmentation  and  clustering  for  an  unknown  number  of  multiple  signal  sources 
based  on  ergodic  hidden  Markov  models  where  each  speaker  corresponded  to  a  single  signal  source.  Other 
segmentation  algorithms  that  have  been  reported  include  those  in  [13]  and  [14]. 

The  outline  of  this  paper  first  reviews  speaker  identification  based  on  statistical  models.  This  is  followed  by  the 
description  of  the  speaker  segmentation  algorithm.  Finally,  experiments  were  then  used  to  determine  the 
effectiveness  of  this  algorithm  and  to  determine  if  it  provides  an  improvement  to  speaker  identification. 

2.0  Review  of  the  Speaker  Identification  based  on  Statistical  Models. 

Common  speaker  identification  systems  are  based  on  statistical  models.  The  speech  first  undergoes  preprocessing 
including  speech  activity  detection,  and  conversion  to  its  corresponding  cepstral  coefficients.  These  coefficients 
describe  predominate  physiological  factors  that  distinguishes  one  person’s  voice  from  another  [1]. 

Statistical  models  are  formed  using  Gaussian  mixture  models  (GMMs)  these  are  weighted  summations  of 
Gaussian  functions  each  having  their  own  mean  and  variance.  These  weightings,  means,  and  variances  form  the 
parameter  set  for  a  particular  speaker.  The  parameter  set  can  be  estimated  using  the  expectation  maximization 
algorithm  or  a  form  of  vector  quantisation  from  a  source  of  training  data. 

The  probability  of  a  feature  vector,  3c, ,  given  a  parameter  set,  A ,  is  given  by, 

1=1 

where 


The  parameter  set  for  the  GMM  is  denoted  as  A  =  {/?,. ,  ,  E,. }  for  t  =  1, . , .  M  with  the  constraint  that 

M 

^  p.—\.  It  is  assumed  that  the  time  observations  for  the  feature  vectors,  ,  are  statistically  independent  of 
i=l 

each  other.  Consequently,  the  probability  for  a  series  of  observations  for  a  given  model  is. 


/7(x|A)=n/^feW- 


(3) 


f=i 


Speaker  identification  is  achieved  by  choosing  the  maximum  probable  model  given  the  observation  vectors, 

5  =  arg  max  Pr  (A^  |X  ) .  (4) 


Bayes  theorem  allows  us  to  rearrange  Eqn  (4)  to  give. 


5  =  arg  max 

l<jt<S 


p{AK)HK) 

pix) 


(5) 
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Eqn  (5)  can  be  reduced  from  the  fact  that  certain  probabilities  equate  to  unity  or  have  no  direct  influence  on  the 
maxima.  Consequently,  speaker  identification  is  obtained  by  choosing  the  maximum  likelihood  of  the  observation 
vectors  given  a  model  formalized  as, 


5  =  argm^p(x|Aj. 


(6) 


A  further  simplication  is  made  from  statistical  independence  and  use  of  logarithms  to  yield. 


■^  =  argmax^log/7(x,|A,). 


(7) 


Intuitively,  if  there  are  observations  that  do  not  belong  to  the  true  speakers  this  effects  the  log-likelihood  scores 
and  may  force  a  bias  unto  the  result. 

3.0  Segmentation  Algorithm. 

The  paradigm  for  the  segmentation  algorithm  is  to  partition  the  speech  waveform  into  separate  segments.  Each  of 
these  segments  belongs  to  a  speaker,  and  is  consequently  referred  to  as  utterance  units.  The  locations  of  these 
utterance  units  are  denoted  by  their  start  and  stop  positions  in  the  speech  waveform.  The  statistical  parameters  for 
the  target  speakers  were  already  known.  The  target  speaker  statistical  models  initiate  the  partitioning  of  the 
utterance  units  into  separate  groups.  Each  group,  corresponding  to  a  speaker,  form  observation  vectors  for  each 
person  in  the  conversation.  Normalized  log-likelihood  scores  are  obtained  for  the  target  speakers  against  each 
group.  The  most  likely  speaker  has  the  maximum  likelihood  score  and  furthermost  distance  from  the  background 
speaker.  A  more  detailed  discussion  of  the  paradigm’s  steps  now  follow. 

In  some  practical  situations,  such  as  telephone  conversions,  not  only  is  there  more  then  one  speaker  but  other 
types  of  signal  as  well.  These  signals  include  dial  tone  frequency  modulation,  faxes,  or  modems.  It  is  necessary  to 
first  remove  those  signals  that  are  not  speech.  The  details  of  this  processing  is  not  discussed  in  detail  here,  but  the 
end  result  of  this  initial  processing  produces  speech  data  that  has  been  converted  to  a  16  bit  linear  format. 

Once  in  the  proper  format  a  speech  activity  detector  is  used  to  produce  indexing  to  locate  utterance  units.  The 
utterance  units  are  extracted  from  the  speech  with  use  of  an  energy  envelope  detector.  The  time  duration  used  in 
this  detector  was  of  the  order  of  three  second  long,  and  formed  continuous  speech  energy  samples. 

Generally,  a  region  where  the  signal  has  a  large  energy  concentration  corresponds  to  “voiced”  speech.  It  is  these 
regions  that  contain  physiological  information  about  the  speaker  and  are  used  to  train  statistical  models.  To 
determine  the  utterance  location  the  signal  energy  must  exceed  a  threshold.  This  threshold  value  is  dynamically 
set  in  the  sense  that  it  adapts  to  environmental  noise  changes.  To  determine  the  threshold  value  an  estimate  of  the 
signal  to  noise  ratio  is  obtained  by  sorting  the  energy  vector  from  lowest  to  highest  value.  Signal  and  noise  energy 
estimates  are  obtained  from  the  lower  and  upper  quarters  of  the  sorted  energy  vector.  The  value  of  the  threshold  is 
formulated  by  an  equation,  being  a  function  of  SNR,  and  takes  into  account  the  absence  of  speech  signals. 

To  this  point,  the  resulting  speech  segments  are  regions  where  the  signal  energy  exceeds  the  calculated  threshold 
and  belongs  to  either  speaker.  However,  for  better  performance  it  is  desirable  to  have  as  many  feature  vectors 
assigned  to  a  segment  as  possible.  To  achieve  longer  time  duration  segments  the  length  and  distance  between 
segments  are  noted.  The  desire  is  to  concatenate  segment  to  form  longer  segments.  If  the  distance  between  the  two 
adjacent  segments  is  less  than  the  syllabic  rate  then  the  two  segments  are  concatenated.  This  procedure  is  followed 
for  the  entire  signal  with  the  resulting  utterance  unit  lengths  being  several  words  long.  On  completion  of  the 
concatenation  process,  the  utterance  units  of  smaller  length  are  discarded  to  minimize  the  variance  of  the  speaker 
log-likelihood  estimates. 

The  utterance  units  are  normalized  and  used  to  generate  cepstral  coefficients  of  time  duration  lengths  of  20ms. 
The  coefficients  are  formed  from  an  auto-correlation  function  and  taking  the  inverse  Fourier  transform  of  the 
logarithm  of  the  spectrum.  The  first  ceptral  coefficient  is  discarded  since  it  corresponds  to  the  power  of  the  signal 
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and  does  not  convey  information  about  the  speech  process.  Finally,  the  mean  of  the  coefficients  is  subtracted  and 
the  result  is  passed  on  to  form  speaker  log-likelihood  score. 

Log-likelihood  scores  are  obtained  using  Gaussian  mixture  models  whose  parameters  are  for  the  target  speakers. 
The  result  is  an  array  of  log-likelihood  scores  against  each  target  speaker  and  utterance  units.  Each  element  of  the 
array  is  obtained  by, 


»=1 


(8) 


Where  xj  denotes  the  cepstral  coefficient  for  the  T*  utterance  unit  and  f""  time  observation.  The  relationship 

between  the  target  speakers  is  used  to  determine  which  group  the  utterance  unit  belongs.  The  log-likelihood  array 
can  be  viewed  as  a  set  of  segmentation  feature  vectors  for  a  set  of  observations.  The  feature  vector  are  formed 
from  the  log-likelihood  score  for  each  target  speaker  [15],  and  the  time  observations  correspond  to  the  utterance 
units.  The  task  now  is  given  these  feature  vectors,  for  a  set  of  observations,  to  separate  them  into  two  separate 
groups. 

From  the  set  of  feature  vectors  two  vectors  are  chosen  that  give  the  maximum  separation  distance  between  them 
assuming  each  belongs  to  each  speaker.  An  initial  projection  vector  is  formulated  using  these  two  feature  vectors. 
This  projection  vector  is  constructed  in  such  a  way  that  the  common  information  is  subtracted.  The  resultant 
projection  operator  places  the  feature  vectors,  for  each  time  observation,  into  positive  and  negatives  half  spaces 
that  separate  the  speakers. 

Iteration  is  used  to  re-estimate  the  projection  vector.  The  projected  observations  are  sorted  and  percentiles  of  the 
feature  vectors  are  grouped.  The  percentile  is  increased  after  each  iteration.  This  process  chooses  feature  vectors 
that  best  describe  the  group.  A  common  feature  for  each  group  is  extracted  from  the  principle  component  of  the 
singular  value  decomposition  and  the  corresponding  projection  vector  reconstructed.  After  the  completion  of  the 
iteration  the  resulting  projection  vector  is  used  to  segment  each  speaker  onto  a  half  space. 

Once  the  utterance  units  have  been  grouped  then  the  overall  log-likelihood  for  each  group  is  obtained  by  the 
summation  of  the  individual  log-likelihoods  for  each  utterance  unit.  Speaker  identification  can  then  be  made. 

4.0  Experimental  results.  .  «  .  .u  ♦  •  .i 

In  this  section  an  example  is  given  to  demonstrate  the  segmentation  algorithm  and  to  show  the  afiect  that  mixea 
speakers  has  on  log-likelihood  scores.  The  second  experiment  performed  evaluates  the  effectiveness  speaker 
segmentation  before  performing  speaker  identification.  The  conclusions  drawn  from  this  experiment  were  based 
on  several  conversations. 

Figure  1  shows  a  scatter  plot  for  a  two-person  conversation  illustrating  each  speaker  has  been  separated.  Figure  2 
presents  the  result  of  the  log-likelihood  scores  for  the  mixed  speakers.  This  figure  also  gives  the  log-likelihood 
scores  after  the  two  speakers  have  been  separated.  The  true  speaker  was  speaker  number  1.  Before  splitting  the 
maximum  log-likelihood  corresponded  to  speaker  number  two,  and  after  splitting  it  corresponded  to  the  correct 
speaker.  The  intent  of  this  plot  was  to  demonstrate  the  fact  that,  in  some  cases,  a  speaker  can  bias  the  true  log- 
likelihood  scores.  Preliminary  results  have  shown  there  is  an  improvement  in  speaker  identification  using 
splitting  in  the  testing  procedure. 

5.0  Conclusion.  r  r. 

This  paper  has  reported  on  the  hypothesis  that  speaker  separation  improves  the  performance  of  speaker 
identification.  The  situation  investigated  was  for  the  two-speaker  case.  Utilizing  the  information  in  the  log- 
likelihood  scores  of  the  utterance  unit  made  it  possible  to  separate  the  speakers.  Regrouping  and  adding  the  log- 
likelihood  scores  results  in  a  set  of  scores  for  each  speaker.  We  then  evaluated  speaker  identification  to  determine 
weather  splitting  improves  the  performance  of  the  overall  speaker  identification. 
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Figure  1  Scatter  plot  for  the  Segmentation  of  the  Speakers. 


speaker  number 


Figure  2  Plot  of  the  log-likelihood  scores. 
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I.  INTRODUCTION 

There  are  several  signal  processing  applications  where  a  variety  of  sensors  are 
available  for  measuring  a  given  process,  however  physical  and  computational  con¬ 
straints  may  impose  the  requirement  that  at  each  time  instant,  one  is  able  to  use 
only  one  out  of  a  possible  total  of  M  sensors.  There  is  also  growing  interest  in 
flexible  sensors  such  as  multi-mode  reidar  which  can  be  configured  to  operate  in  one 
of  many  modes  for  each  measurement.  In  such  cases,  one  has  to  make  the  deci¬ 
sion:  Which  sensor  (or  mode  of  operation)  should  be  chosen  at  each  time  instant 
to  provide  the  next  measurement.  It  may  also  happen  that  one  can  associate  with 
each  type  of  measurement  a  per  unit-of-time  measurement  cost,  reflecting  the  fact 
that  some  measurements  are  more  costly  or  difficult  to  make  than  others,  although 
they  may  contain  more  useful  or  reliable  information.  The  problem  of  optimally 
choosing  which  one  of  the  M  sensor  observations  to  pick  at  each  time  instant  is 
called  the  sensor  scheduling  problem.  The  resulting  time  sequence  which  at  each 
instant  specifies  the  best  sensor  to  choose  is  termed  the  sensor  schedule  sequence. 

Several  papers  have  studied  the  sensor  scheduling  problem  for  systems  with  lin¬ 
ear  Gaussian  dynamics  where  linear  measurements  in  Gaussian  noise  are  available 
at  a  number  of  sensors  (see  [1]  for  the  continuous-time  problem  and  [7]  for  the 
discrete-time  problem).  For  such  linear  Gaussian  systems,  if  the  cost  function  to 
be  minimized  is  the  state  error  covariance  (or  some  other  quadratic  function  of  the 
state),  then  the  solution  has  a  nice  form:  the  optimal  sensor  schedule  sequence  can 
be  determined  a  priori  and  is  independent  of  the  measurement  data  (see  [1],  [7]  for 
details).  This  is  not  surprising;  since  the  Kalman  filter  covariance  is  independent 
of  the  observation  sequence. 

In  this  paper  we  study  the  discrete-time  sensor  scheduling  problem  when  the 
underlying  process  is  a  finite  state  Markov  chain  that  is  observed  in  white  noise. 
The  signal  model  is  as  follows:  At  each  time  instant,  observations  of  a  Markov 
chain  in  white  noise  are  made  at  M  different  sensors.  However,  only  one  sensor 
observation  can  be  chosen  at  each  time  instant.  The  aim  is  to  devise  an  algorithm 
that  optimally  picks  which  single  sensor  to  use  at  each  time  instant,  in  order  to 
minimize  a  given  cost  function.  We  will  show  that  unlike  the  linear  Gaussian  case, 
the  optimal  sensor  schedule  in  the  HMM  case  is  data  dependent.  This  means  that 
past  observations  together  with  past  choices  of  which  observation  to  pick  influence 
which  observation  to  choose  at  present. 
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In  our  recent  work  [4],  we  formulated  the  HMM  sensor  scheduling  problem  and 
presented  an  infinite  dimensional  dynamic  programming  functional  recursion  for  its 
solution.  An  approximate  algorithm  was  then  presented  in  [4]  which  was  based  on 
discretizing  the  dynamic  programming  recursion  to  a  finite  grid. 

The  main  contribution  of  this  paper  is  to  present  an  optimal  finite  dimensional 
solution  to  the  HMM  sensor  scheduling  problem.  Indeed  we  show  that  the  solution 
to  the  dynamic  programming  equation  is  piecewise  linear  and  convex.  An  algorithm 
is  given  for  computing  these  piecewise  linear  segments.  The  finite  dimensional 
scheduling  algorithms  presented  in  this  paper  are  similar  to  those  recently  used 
in  the  operations  research  (see  [6]  for  a  tutorial  survey)  and  in  robot  navigation 
systems  [3]  for  the  optimal  control  of  Partially  observed  Markov  Decision  Processes 
(POMDP).  However,  our  problem  has  the  added  complexity  that  the  cost  function 
is  quadratic  function  of  the  information  state  -  whereas  the  standard  POMDP 
problem  consists  of  a  cost  that  is  linear  in  the  information  state.  We  show  that  by  a 
novel  change  of  coordinates,  the  problem  can  be  re-expressed  as  a  standard  Hidden 
Markov  Model  control  problem  and  optimally  solved  using  similar  algorithms  to 
those  use  for  solving  POMDPs. 

2.  SIGNAL  MODEL  AND  PROBLEM  FORMULATION 

Let  k  =  0, 1, . . .  denote  discrete  time.  Assume  X*  is  an  5-state  Markov  chain 
with  state  space  {ei, . . .  ,es}.  Here  Ci  denotes  the  S-dimensional  unit  vector  with 
1  in  the  z-th  position  and  zeros  elsewhere.  This  choice  of  using  unit  vectors  to 
represent  the  state  space  considerably  simplifies  our  subsequent  notation.  Define 
the  5x5  transition  probability  matrix  A  as 

A  =  [ojilsxs  where  Uji  =  P{Xk  =  ei\Xk-i  =ej),  z,  j  €  {1, . . .  , 5}. 
Denote  the  initial  probability  vector  pio  of  the  Markov  chain  as 

TTo  =  [7ro(i)]5xi  where  7ro(z)  =  P{Xo  =  z),  z  €  {1, . . .  ,  5}. 


2.1.  Sensor  Scheduling  Problem 

Assume  there  are  L  noisy  sensors  available  which  can  be  used  to  give  measure¬ 
ments  of  Xk-  At  each  time  instant  k,  we  are  allowed  to  pick  only  one  of  the  L 
possible  sensor  measurements.  Motivated  by  the  physical  and  computational  con¬ 
straints  alluded  to  in  the  introduction,  we  assume  that  having  picked  this  sensor, 
we  are  not  allowed  to  look  at  any  of  the  other  L  —  1  observations  at  time  k. 

Let  lift  G  {1, ...  1}  denote  the  sensor  picked  at  time  k.  The  observation  measured 
by  this  sensor  is  denoted  as  ykink)-  Suppose  at  time  k,  we  picked  the  Zth  sensor, 
i.e.  Uk  —  I,  where  Z  G  {!,...  ,1}.  We  assume  that  the  measurement  ykil)  of  the 
Z-th  sensor  belongs  to  a  known  finite  set  of  symbols  CJi(Z),  02(Z), . . .  >  Om,  (Z).  That 
is  the  Z-th  sensor  can  yield  one  of  Mi  possible  measurement  values  at  a  given  time 
instant.  For  zz*  G  {1, . . .  ,  L},  denote  the  symbol  probabilities  as  bi{uk  =  Z,  yk{uk)  = 
OmO))  =  P{yk{uk)  =  'Om{uk)\Xk  =  ei,Uk  =  Z),  z  =  1,2,...  ,5.  These  represent 
the  probability  that  an  output  Om{l)  is  obtained  given  that  the  state  of  the  Markov 
chain  is  ei  and  that  the  Zth  sensor  is  chosen.  The  symbol  probabilities  are  assumed 
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known.  Finally  define  the  symbol  probability  matrix 

Ofni'dk)) 

B{uk,Omiuk))  =  dia.g  .  : 

.  bs{Uk,  Omi'^k)) 

Finally,  for  notational  convenience  let 

cl>  =  iA,B{l,Om{l))),  1  =  1, =  (1) 

denote  the  entire  parameter  vector  which  comprises  of  the  transition  probability 
matrix  and  all  the  s3mtibol  probabilities  of  all  the  L  sensors. 

Let  Yk  =  {ui,U2,...  ,‘Uk,yiiui),y2iu2),...  ,yki'ak)}  so  that  7*  represents  the 
information  available  at  time  k  upon  which  to  base  estimates  and  sensor  scheduling 
decisions.  The  sensor  scheduling  and  estimation  problem  proceeds  in  three  stages 
for  each  fc  =  0, 1,...  ,N  —  1,  where  iV  is  a  fixed  positive  integer 

1)  Scheduling:  Based  on  Yk  we  generate  Uk+i  =  iik+i{Yk)  which  determines 
which  sensor  is  to  be  used  at  the  next  time  step. 

2)  Observation:  We  then  observe  yk+iiuk+i)  where  Uk+i  is  the  sensor  selected 
in  the  previous  stage. 

3)  Estimation:  After  observing  yk+ii'Uk+i)  we  generate  our  best  estimate  Xk+\ 
of  the  state  of  the  Markov  chain  Xk+\  as  X/t+i  =  E{Xjfe4.i  |7fc+i}.  Here  Xk  denotes 
the  Hidden  Markov  Model  filtered  state  estimate  at  time  fc  +  1  defined  as 

s 

-^fc+i  =  lE{Xj(.+i  I  Tfe+i}  =  P(Xk+i  =  Cj  I  Yk+i),  Xo  =  TTo- 

i-l 

Note  that  the  state  estimate  Xk+i  is  dependent  on  the  scheduling  sequence  of 
sensors  picked  from  time  1  to  A:  +  1,  i.e.  «i, . . .  ,Uk+i  (since  it  depends  on  Tfe+i). 

With  these  steps  in  mind,  we  define  the  sensor  scheduling  sequence 

fi  =  {fii,  H2,  ■  ■  ■  ,fiN} 

and  say  that  the  scheduling  sequences  are  admissible  if  fik+x  maps  7*  to  {1, . . .  ,  M}. 

Note  that  fi  is  a  sequence  of  functions. 

We  assume  the  following  cost  is  associated  with  estimation  errors  and  with  the 
particular  sensor  schedule  chosen.  If  based  on  the  observation  at  time  k,  the  decision 
is  made  at  time  k  to  choose  the  I-th  sensor,  /  G  {1,  ...,£}  at  time  fc+1,  i.e.  Uk+i  =  I, 
the  instantaneous  cost  incurred  at  time  k  is 

{Xk  -  XkYRkilXXk  -  Xk)  +  Ck(Xk,l)}.  (2) 

Here  Rk{l),  I  =  1,2 . . .  , L  are  known  positive  definite  weighing  matrices  and 
Our  aim  is  to  find  the  optimal  sensor  schedule  to  minimize  the  total  accumulated 
cost  Jfi  from  time  1  to  W  over  the  set  of  admissible  control  laws: 

N-l  N-1 

J^=E{J2  (Xk  -  XkyRk{uk+i){Xk  -  Xk)+Yl  Ck{Xk,Uk+i)}  +  E{{Xn  -  XM)'RNiXN  -  ^n)} 

k=0  k=0 

(3) 
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FIG.  1.  The  Sensor  Scheduling  and  Estimation  Problem 


where  Uk+i  =  /ifc+i(Flfe). 

The  above  objective  (3)  can  be  interpreted  as  follows:  The  rainiraization  of  the 
first  summation  results  in  the  optimal  sensor  schedule  that  minimizes  the  weighted 
mean  square  error  in  the  state  estimate  of  the  Markov  chain  state  X/.-  In  particular 
if  R/i  is  set  as  the  identity  matrix  for  all  time  k,  minimization  of  3)  yields  the  optimal 
sensor  u*  6  {1, 2, ...,£}  to  pick  at  each  time  instant  k,  k  =  1,2,...  ,N  to  yield 
the  TrrmiTrnTnrT  mean  square  error  state  estimate  of  the  Markov  chain.  The  weight 
terras  Rail)  allow  different  sensors  I  6  {1,2, ...  ,L}  to  be  be  weighed  differently. 
The  time  index  in  Rk  allows  us  to  weigh  the  state  estimate  errors  over  time. 

The  second  summation  term  reflects  the  cost  involved  in  using  a  sensor  (i.e.  the 
unit  time  sensor  charge)  when  the  the  Markov  chain  is  in  a  particular  state.  The 
final  terra  is  the  terminal  cost  at  time  N. 

2,2.  Inforraafion  State  Formulation 

As  it  stands,  the  above  HMM  sensor  scheduling  problem  is  a  partially  observed 
infinite  horizon  stochastic  control  problem.  As  is  standard  with  such  stochastic 
control  problems  -  in  this  section,  we  convert  the  partially  observed  stochastic 
control  problem  to  a  fully  observed  stochastic  control  problem  defined  in  terras  of 
the  information  state  [5]. 

The  information  state  at  time  k,  which  we  will  denote  by  iTk — column  vector 
of  dimension  S,  is  merely  the  conditional  filtered  density  of  the  Markov  chain  Xk 
given  the  observation  history  F*.  That  is  7rfc(i)  =  P{Xk  =  ei\Yk),  i  =  1,2,...  ,S. 
Also  because  we  have  assumed  that  Xk  is  a  unit  vector  e  {ci, . . .  ,  eiv },  we  straight¬ 
forwardly  have  Trk=  Xk.  (This  is  one  of  the  notational  advantages  of  depicting  the 
state  space  by  unit  vectors). 
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The  information  state  ttj,  is  a  suiScient  statistic  to  describe  the  current  state  of  a 
HMM  (see  [2]  and  [5].  The  information  state  update  is  computed  straightforwardly 
by  the  HMM  state  filter  (also  known  as  the  “forward  algorithm”  ) 

B{uk+i,Vk-^-\{uk+i))A'T:k 

VsB{uk^uyk+i{uk+x))A'i:k  ^  ^ 

where  Is  represents  an  5-diraensional  vector  of  ones. 

Let  denote  the  set  of  all  information  states  tt  that  can  be  achieved  by  the 
sensors  given  the  parameter  vector  <t>  defined  in  (1).  That  is, 

7^^  =  {tt  e  :  I^TT  =  1,  0<7r(i)<lforallz€{l...,S}}  (5) 

Using  the  smoothing  property  of  conditional  expectation,  the  cost  functional  of 
(3)  can  be  rewritten  in  the  form 


JV-l 

=  E{C'Ar(7rjv)  +  ^  Ck{irk,fik+i{T^k))} 
k=0 


(6) 


5 

Cn{tcn)  =  X^(ei  -  XN)'B.N{ei  -  Xn)  T^Nii), 

j=i 


Ck{'Kk,Uk+i)  =  ^  \^{ei  -  Xk)'Bk{nk+i){ei  -  Xk)  +  Ck{ei,Uk+i^  ■Kk{i) 

s 

(70(71, iz)  =  ^  [co(ei,«)  +  (ci  -  7ro)'J?o(tti)(ei  -  tto)]  7ro(i)- 
i=l 

Substituting  for  Xk  =  tt*  in  in  the  above  equations,  after  some  algebraic  manipu¬ 
lations  we  can  write  C*  as  a  quadratic  function  of  the  information  tt  as  follows: 

(7fc(7r*;,Ufc+i)  =  -n[Rk{uk+i)Trk  +  g'k{uk+i)T^k  (7) 

where  gk{u)  denotes  the  S  dimensional  vector  with  elements 

5fc(ei,«fc+i)  =  jSfc(ei,ei,«fc+i)-fCfc(ei,Ufc+i),  i  =  l,2...,S.  (8) 

We  now  have  a  fully  observed  control  problem  in  terms  of  the  information  state 
tt:  Find  an  admissible  control  law,  /z,  which  minimizes  the  cost  functional  of  (6), 
subject  to  the  state  evolution  equation  of  (4). 

2.3.  Examples  aind  Applicaiions 

1.  Optimal  Filtering  versus  Prediction  Consider  the  tracking  problem  of 
measuring  the  state  of  a  target  from  radar  derived  measurements.  Assume  that  the 
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target’s  coordinates  evolve  according  to  a  finite  state  Markov  chain  Xk  with  known 
transition  probability  matrix  A.  Assume  that  at  each  time  instant  k  we  have  two 
choices: 

(i)  «*  =  1:  Obtain  a  radar  derived  noisy  measurement  of  the  target  position  yki^k  = 
1).  Assume  that  the  noise  density  and  hence  the  symbol  probability  matrix  = 

1, yk)  is  known.  After  observing  the  targets  position  yki^k  =  1))  we  compute  the 
best  filtered  estimate  of  the  target’s  position  Xk  by  using  the  HMM  filter.  Let 
c(X*,l)  denote  the  cost  of  using  the  radar  when  the  target’s  true  position  is  Xk- 
For  example,  the  cost  c{Xk,  1)  would  typically  be  large  when  the  target  Xk  is  close 
to  the  radar  tracker. 

(ii)  Uk  =  2:  Do  not  observe  the  target  state.  This  is  equivalent  to  choosing  B{uk  = 

2, yk)  =  /,  as  the  observation  yk  then  contains  no  information  of  the  state  of  the 
Markov  chain  Xk-  Without  using  the  radar  for  observing  the  target,  we  can  only 
compute  the  best  predicted  estimate  of  the  target  via  a  Hidden  Markov  model  state 
predictor.  Let  c(X*,2)  denote  the  cost  of  not  using  the  radar. 

In  addition  to  the  cost  of  using  the  radar  c(Xk,Uk),  we  also  incorporate  into 
our  cost  function  the  mean-square  estimation  error  of  the  target’s  coordinates. 
Suppose  our  aim  is  to  chose  at  each  time  between  Uk  =  l  (obtaining  a  radar  derived 
observation  and  using  a  HMM  filter)  versus  «*  =  2  (not  making  a  measurement  and 
using  a  HMM  predictor)  to  minimize  the  cost  function  in  (3).  Then  the  problem  is 
identical  to  the  sensor  scheduling  problem  posed  above. 

2.  Optimal  Quantization  Problem:  Given  an  S  state  Markov  chain  Xk 
observed  in  noise,  consider  the  following  joint  source  coding  and  estimation  problem 
which  seeks  to  compute  the  optimal  tradeoff  between  quantization  bits,  channel 
transmission  cost  and  state  reconstruction.  Suppose  at  each  time  k,  one  has  to 
choose  between  the  following  L  possibilities:  Quantize  the  observation  to  I  bits  and 
transmit  these  I  bits  over  the  channel  at  a  transmission  cost  of  c{l,  Xk),  I  =  1,  -  -  -  ,  L. 
The  receiver,  seeks  to  recover  the  best  estimate  of  Xk,  i.e.  the  minimum  mean 
square  state  estimate  subject  to  the  channel  transmission  cost  c{l,  Xk).  The  cost 
function  is  then  identical  to  (3).  The  sensor  scheduling  problem  then  yields  the 
optimal  answer  to  how  many  of  bits  one  nnist  quantize  the  observations  at  time  k 
to  minimize  the  transmission  and  reconstruction  cost. 
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ABSTRACT 

This  paper  proposes  intelligent  virtual  advisers  and  animated  scenes  in  virtual  realities  as  an  appropriate 
technological  aid  for  Situation  Awareness.  It  then  outlines  the  ATTITUDE  agent  architecture  as  a  basis  for  building 
intelligent  virtual  advisers. 

1.  Introduction 

Endsley  ([1])  defines  Situation  Awareness  (SA)  as  follows. 

“Situation  awareness  is  the  perception  of  the  elements  in  the  environment  within  a  volume  of 
time  and  space,  the  comprehension  of  their  meaning,  and  the  projection  of  their  status  in  the 
near  future.” 

As  perception,  comprehension  and  projection  characterise  mental  attributes,  SA  is  understood  as  a 
mental  phenomenon,  and  in  the  absence  of  anthropomorphism,  is  understood  to  be  about  human 
minds.  So  viewed,  S  A  is  not  a  computer  system  or  a  screen,  it  is  a  state  of  human  awareness. 

Technologies  and  technological  aids  are  often  introduced  to  enhance  the  state  of  human 
awareness,  and  so  the  advancement  of  SA  is  partly  about  psychology,  partly  about  technology, 
and  partly  about  the  integration  of  the  two.  The  concept  of  a  Common  Reference  Picture*  (CRP), 
is  often  cited  as  the  technological  solution  to  SA.  I  am  less  than  enthused  by  the  CRP 
recommendation  and  in  [2]  cast  a  critical  eye  over  each  of  its  common,  reference  and  picture 
components.  The  term  picture  is  typically  used  to  refer  to  a  (possibly  fused)  track  display.  As  a 
general  technological  aid  to  SA,  track  displays  place  considerable  cognitive  burden  on  the 
commander.  This  paper  proposes  a  radical  alternative  to  track  displays  as  the  primary 
technological  aid  for  SA. 

2.  Virtual  Advisers 

In  a  command  and  control  setting,  SA  is  acquired  by  being  informed  about  what  is  going  on  in  the 
world.  In  our  daily  lives  we  are  likewise  informed  of  what  is  going  on  through  the  news  services, 
with  print,  radio,  television  and  the  Internet  serving  as  the  dominant  forms.  The  last  two  afford  the 
advantage  of  being  able  to  supplement  the  text,  photographs  and  sounds  of  the  first  two  with  a 
dynamic  imaging  capability.  Television  news  broadcasts  typically  involve  news  presenters,  weather 
presenters,  sports  presenters,  reporters,  expert  interviews,  diagrams,  graphs  and  video  footage. 
The  various  individuals  are  assembled  as  advisers  to  the  viewer  and  the  visual  footage  is  engaged 


*  Common  Operating  Picture  in  the  US,  Joint  Operating  Picture  in  the  UK 
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wherever  possible  to  refine  the  mental  images  that  the  viewer  otherwise  fonm  from  the  spoken 
and  written  words.  News  broadcasts  provide  the  viewer  with  SA  by  carefully  assembling  these 
various  components  to  tell  a  story. 

Military  commanders  within  command  and  control  centres  should  perhaps  acquire  SA  through 
facsimiles  of  news  broadcasts,  and  m  some  sense,  existing  staff  briefs  to  these  commanders  are  a 
comparatively  limited  attempt  to  do  just  that.  But  while  these  proposed  military  news  broadcasts 
would  offer  potential  advantages  over  traditional  briefings,  they  would  still  be  impeded  by  some 
drawbacks. 

1.  The  news  broadcast  approach  to  SA  is  people  based  and  therefore  lacks  portability.  Platform 
based  radar  operators  or  fighter  controllers  do  not  have  the  luxury  of  a  team  of  advisers  to 
provide  them  with  news  broadcasts  relevant  to  their  SA. 

2.  The  news  broadcast  approach  to  SA  engages  visual  footage  wherever  possible,  but  in  the 
military  context,  the  information  available  is  often  confined  to  terse  military  messages  without 
direct  visual  footage. 

3.  Being  people  based,  the  news  broadcast  approach  to  SA  is  typically  a  one  off  performance  by 
the  various  advisers.  Ideally  the  viewer  should  be  able  to  observe  the  news  when  it  is 
convenient  for  them  to  do  so,  and  be  able  to  rewind  and  replay  parts  of  the  presentation. 

4.  The  news  broadcast  approach  to  SA  delivers  information,  but  prevents  the  viewer  from 
interactively  conversing  with  the  advisers  to  further  enhance  their  S A. 

In  response  to  these  difficulties  I  offer  the  following  suggestions. 

1.  The  news  broadcast  should  be  deliverable  by  software  in  addition  to  people.  This  facilitates 
greater  portability. 

2.  Where  appropriate,  2  and  3  dimensional  virtual  reality  animations  of  military  messages  should 
be  constructed  to  provide  animated  movie  footage  of  the  events  being  described. 

3.  The  advisers  in  conventional  news  broadcasts  should  be  replaced  by  virtual  advisers  in  the 
software.  They  will  provide  a  story  telling  counterpart  in  the  software. 

4.  The  virtual  advisers  should  be  intelligent.  They  should  be  repositories  of  particular  expertise 
and  be  capable  of  interacting  with  the  user  about  their  area  of  expertise.  The  aim  is  to  deliver 
virtual  people  who  interact  with  the  user,  their  environment,  and  one  another,  to  meet  the 
user’s  SA  needs. 

3.  Attitude 

The  remainder  of  this  paper  concerns  itself  with  how  we  construct  intelligent  virtual  advisers, 
independently  of  their  animated  form  and  their  animated  environments.  In  particular,  it  promotes 
as  a  solution,  the  Attitude  software  product  being  developed  at  the  Defence  Science  and 
Technology  Organisation.  The  predecessor  to  ATTITUDE  was  designed  as  an  intelligent  controller 
for  an  AEW  phased  array  radar,  and  so  it  is  worth  noting  that  the  virtual  advisers  made  available 
to  a  commander  need  not  necessarily  exist  locally.  They  could  perform  various  fiinctions,  be 
distributed  across  various  platforms  in  the  battlespace,  and  be  made  accessible  to  the  commander 
through  networking  and  an  avatar  form. 

To  understand  why  Attitude  delivers  a  framework  for  developing  virtual  people,  it  is  necessary 
to  appreciate  its  motivation.  The  current  computer  science  paradigm  began  in  the  1940s  with  a 
communicative  gulf,  with  a  human  user  flush  with  conceptualisations  at  one  extremity  and  the 
computer  as  a  complex  electronic  switching  device  at  the  other.  The  current  computer  science 
paradigm  has  sought  to  bridge  this  gulf  by  dragging  the  computer  closer  to  the  user  by  embedding 
human  conceptualisation  within  the  machine  and  then  interfacing  those  conceptualisations  to  the 
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user  as  if  primitive  thereafter.  Thus  we  have  seen  the  familiar  progression  of  machine  languages, 
assembly  languages,  floating  point  arithmetic,  higher  level  languages,  graphical  user  interfaces, 
and  speech  processing  systems.  If  we  continue  to  pursue  this  paradigm,  then  at  the  automation 
limit  we  would  interact  with  the  computer  as  if  it  were  another  user,  and  we  would  predict  and 
explain  its  behaviour  in  a  similar  manner  to  how  we  predict  and  explain  human  behaviour. 

AUTOMATION 

. 

Computer  Communicative  Gulf  User 

Figure  1:  Automation  Paradigm 

Humans  predict  and  explain  the  behaviour  of  other  humans  by  ascribing  mental  attitudes  to  them, 
such  as  beliefs,  desires,  expectations,  fears,  hopes,  et  cetera,  and  when  expressing  these  and  other 
mental  attitudes,  the  syntax  of  the  expression  always  assumes  the  form 
<subject>  <attitude>  that  <propositional  expression> 

The  following  examples  illustrate 

Fred  believes  that  the  sky  is  blue  Tom  expects  that  it  will  rain  Mary  hopes  that  Tom  is  insightful 
Expressions  having  this  syntactic  form  are  called  propositional  attitude  expressions  and  the  beliefs 
et  cetera  that  they  denote  are  termed  propositional  attitudes.  In  a  propositional  attitude 
expression:  the  subject,  e.g.  Fred,  expresses  which  individual  has  the  propositional  attitude;  the 
propositional  expression,  e.g.  the  sky  is  blue,  expresses  some  assertion  about  the  world;  and  the 
attitude,  e.g.  believes,  expresses  the  kind  of  response  the  subject  has  toward  the  proposition.  With 
subtle  modification,  propositional  attitude  observations  such  as 
Fred  believes  that  the  sky  is  blue 

can  be  transformed  into  propositional  attitude  instructions  like 
Fred  believe  that  the  sky  is  blue. 

The  latter  is  an  instruction,  commanding  software  agent  Fred  to  believe  that  the  sky  is  blue.  A 
mechanism  of  this  form  allows  us  to  not  only  predict  and  explain  software  behaviour  at  the 
automation  limit,  but  to  also  program  software  behaviour  at  the  automation  limit.  The  use  of 
propositional  attitude  instructions  as  primitive  programming  instructions  I  call  attitude 
programming  and  Attitude  is  so  named  because  it  practices  attitude  programming.  ATTITUDE 
therefore  satisfies  two  quite  different  motivations  for  virtual  advisers:  one  emanating  from  a 
refinement  of  the  current  computer  science  paradigm,  and  the  other  coming  from  a  strategy  for 
enhancing  SA. 

4.  Individuals  With  Attitude 

Interacting  with  the  World 

In  individual  ATTITUDE  agents,  the  world  is  described  by  the  propositional  expressions  within 
propositional  attitude  instructions.  The  world  of  every  Attitude  agent  is  understood  as  a  world 
of  facts  in  which,  in  their  most  primitive  form,  the  facts  are  expressed  as  atomic  formulae  having 
the  syntactic  form  (<relation>  <termi>  ...  <termk>).  Each  term  represents  an  object  and  the 
relation  identifies  some  relationship  between  those  objects,  e.g.  (taller  Clinton  Howard).  In  Classical 
First  Order  Logic,  the  terms  denoting  objects  are  recursively  grounded  in  symbols,  variables  and 
functions.  In  ATTITUDE,  terms  are  expressions  recursively  formed  from  symbols,  variables, 
functions,  Booleans,  integers,  reals,  subexpressions,  indexed  expressions,  schedules,  events. 


(Automation  Limit) 
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scenarios,  individuals  and  groups.  As  a  consequence,  Attitude  individuals  can  form  very 
sophisticated  structures  to  describe  the  world,  including  recursively  self-referencing  structures^. 

Individual  Attitude  agents  operate  by  being  embedded  in  the  world.  To  facilitate  this  interaction 
they  are  often  equipped  with  sensors  and  effectors  that  allow  them  to  observe  and  alter  the  world. 
An  individual’s  conception  of  the  world  will  often  be  limited  by  what  they  can  experience,  and  this 
could  range  from  a  world  of  radar  tracks  through  to  a  sophisticated  virtual  reality  world.  The 
individual’s  conception  of  the  world  is  realised  by  the  relations  and  objects  that  it  is  able  to 
consider,  and  it  is  the  sensors  and  effectors  role  to  manage  the  interaction  between  events  in  the 
world  and  the  propositional  expressions  within  the  individual.  Attitudes  exist  to  facilitate  control 
of  sensors  and  efiectors. 

Interacting  with  One  Another 

Individual  Attitude  agents  also  interact  socially.  This  is  facilitated  by  the  role  of  subjects  within 
propositional  attitude  instructions.  The  instruction  Fred  believe  that  (blue  sky)  is  an  instruction  to 
individual  Fred  to  believe  that  the  sky  is  blue,  and  so  when  it  is  executed  by  a  second  agent  Tom, 
it  causes  an  instruction  to  believe  that  the  sky  is  blue  to  be  sent  from  Tom  to  Fred.  Individuals  are 
able  to  communicate  beliefs  in  this  way.  Expectations,  anticipations  and  desires  can  also  be 
exchanged  between  individuals. 

Subjects  can  also  include  groups  of  individuals.  Groups  are  simple  sets  of  individuals  and  Boolean 
algebra  operators  are  provided  to  include  and  preclude  individuals  from  group  membership. 
Groups  can  also  be  formed  through  queries.  Issuing  the  query  ?who  ask  if  believe  that  (biue  siqr) 
will  store  in  variable  ?who  the  group  of  individuals  who  believe  that  the  sky  is  blue.  The  interaction 
of  individuals  enables  the  collective  to  arrive  at  outcomes  which  none  of  the  participating 
individuals  might  arrive  at  alone. 

Determining  Behaviour 

The  behaviour  of  each  individual  is  determined  by  the  propositional  attitude  instructions  that  it 
invokes  in  response  to  social  and  environmental  cues.  Each  individual  is  designed  to  exhibit  certain 
routine  behaviours  that  are  applicable  to  its  domain  of  expertise.  These  are  coded  in  routines, 
which  comprise  an  atomic  proposition  goal  and  a  state  transition  network  of  propositional  attitude 
instructions.  Routine  execution  involves  navigating  control  through  the  transition  network  of 
instructions,  with  each  instruction  succeeding  or  failing.  The  routine  is  designed  so  that  the  atomic 
proposition  goal  will  be  satisfied  if  a  successful  path  of  execution  can  be  found  from  the  start  node 
to  a  terminal  node  of  the  network.  Routines  provide  a  procedural  approach  to  knowledge 
representation  [5]. 

In  addition,  the  believe  attitude  accommodates  Horn  clause  beliefs  for  reasoning.  Instructions  such 
as  Fred  believe  that  (son  ?x  ?y)  if  (&  (maie  ?x)  (parent  ?y  ?x))  then  make  Fred  believe  the 
propositional  expression  (son  ?x  ?y)  if  (&  (male  ?x)  (parent  ?y  ?x)).  Each  individual  is  able  to  engage 
their  inference  engine  to  reason  about  rules  of  this  sort.  Beliefs  provide  a  declarative  approach  to 
knowledge  representation  [5]. 


*  In  mathematical  terms  this  means  that  a  set  thecaretic  meta-theory  for  the  model  theory  [3]  of  an  ATTITUDE 
ontology  may  need  to  be  a  non-well  founded  set  theory  [4]. 
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5.  Automated  Awareness 


To  succeed  as  an  SA  adviser,  an  ATTITUDE  individual  must  be  capable  of  assembling  its  own 
awareness  of  the  world.  When  engaging  the  world,  we  rarely  attend  to  individual  facts  in  the 
world. 


Figure  2:  Animated  Event  and  Scenario 


In  assessing  the  typical  mental  snapshot  picture  animated  in  the  left  of  Figure  2,  we  are  inclined  to 
represent  it  as  a  set  of  facts,  perhaps  as 

El  =  {  (at  locll  (blue_asset  tgt3)),  (at  loc42  (faker  tgt?)), 

(at  loc42  (low_altitude  tgt7)),  (at  loc42  (approaching  tgt7  tgt3))}. 

Sets  of  facts  are  termed  events  in  ATTITUDE  and  Boolean  algebra  combinations  of  events  are 
formed  as  scenarios.  The  right  of  Figure  2  illustrates  a  scenario  formed  from  two  events.  The  term 
situation  is  used  collectively  for  events  and  scenarios.  Situation  awareness  is  about  being  aware  of 
situations,  not  facts  or  objects  per  se,  and  all  inference  within  ATTITUDE  is  conducted  relative  to 
situations.  Among  other  things,  this  allows  ATTITUDE  individuals  to  perform  “what  if’  and 
knowingly  counterfactual  reasoning,  by  reasoning  about  possible  events  with  known  scenarios.  [6] 
expands  upon  the  simplified  details  presented  here. 

The  author  recommends  a  five-step  approach  in  devising  an  ATTITUDE  individual. 
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Figure  3;  Automating  Awareness 

Step  1  involves  deciding  the  type  of  objects  and  relations  that  lie  within  the  scope  of  the 
individual’s  expertise.  This  is  best  done  formally,  so  that  the  logical  dependencies  between  those 
object  and  relation  types  are  well  defined.  The  individual  will  not  understand  facts  that  cannot  be 
expressed  in  those  terms.  Step  2  involves  deciding  the  foreseeable  events  that  are  of  interest  to  the 
individual.  These  events  may  comprise  a  lattice  structure  if  fragments  of  events  of  interest  are  also 
of  interest.  For  each  event  of  interest,  one  or  more  routines  are  defined  to  identify  when  those 
events  have  occurred  and  to  identify  the  individual’s  involvement  with  those  events.  The  situations 
accommodated  by  step  2  will  comprise  all  the  situations  which  can  be  generated  from  the  events 
of  interest  and  which  can  be  serviced  collectively  by  the  routines  designed  to  service  the  events 
comprising  it. 

As  depicted  in  the  top  left  of  Figure  3,  routines  for  the  foreseeable  events  may  not  suffice  as  the 
unforeseeable  might  occur.  To  guard  against  this,  step  3  advocates  further  expanding  the  set  of 
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routines  so  that  they  include  a  topologically  well  separated  set  for  which  the  space  of  possibilities 
is  approximately  covered.  The  inclusion  of  routines  that  roughly  accommodate  any  situation 
increases  the  likelihood  of  any  given  situation  being  satisfactory  handled  by  the  individual,  but  as 
the  top  right  of  Figure  3  shows,  it  provides  no  guarantee.  Consequently,  as  illustrated  in  the 
bottom  left  of  Figure  3,  step  4  is  to  allow  the  routines  to  operate  with  tolerance,  so  that  each  can 
be  successfully  executed  with  events  similar  to  those  intended.  This  secures  completeness,  in  the 
sense  that  the  individual  will  always  respond  to  any  given  situation,  though  possibly  with  a 
degraded  level  of  performance  for  unforeseen  situations.  Two  approaches  to  tolerance  have  been 
developed  for  Attitude,  one  based  upon  fuzzy  inference  ([7])  and  the  other  upon  Bayesian 
inference  ([8]).  Only  the  latter  has  been  implemented  to  date.  It  involves  the  association  of 
conditional  probability  tables  with  beliefs  so  that  the  internal  structures  produced  by  the  inference 
engine  can  be  interpreted  as  Bayesian  networks,  thereby  enabling  the  probability  of  the  query, 
including  conditional  queries,  to  be  computed. 

The  fifth  and  final  step  is  to  accept  that,  while  the  routines  of  step  4  will  always  be  able  to  deal 
with  any  situation  that  can  arise,  their  manner  of  dealing  with  some  situations  will  be  less  than 
satisfactory.  Consequently  step  5  is  to  incorporate  routine  adaptation  capabilities,  as  depicted  in 
the  bottom  right  of  Figure  3.  A  case-based  approach  for  ATTITUDE  was  initiated  in  [9].  It  applied 
weakest  precondition  semantics  to  the  execution  traces  of  ATTITUDE  routines  to  learn  the 
conditions  under  which  each  routine  is  likely  to  succeed,  and  then  applied  the  Viterbi  algorithm  to 
select  the  routine  that  is  most  likely  to  achieve  a  nominated  goal  in  the  current  situation.  Another 
approach,  based  upon  routine  generation  through  genetic  algorithms,  is  about  to  commence. 
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A  common  focusing  algorithm  for  Ultra- Wideband,  Wide-Angle  (UWBWA) 
SAR  systems  is  the  delay-sum  backprojector  which  coherently  sums  the 
SAR  data.  It  is  very  robust  to  perturbations  of  the  radar  platform,  but 
does  require  a  huge  amount  of  computation.  In  this  paper,  the  Quadtree 
Backp rejection  algorithm,  which  approximates  the  backprojector,  is  pre¬ 
sented  and  its  characteristics  are  analyzed.  The  Quadtree  algorithm  has 
log2  N  stages,  so  it  is  possible  to  measure  the  focusing  performance  of  each 
intermediate  stage  and  perform  early  detection  of  targets. 


1.  INTRODUCTION 

Various  SAR  focusing  algorithms  have  been  developed  for  low  frequency,  wide¬ 
band  radar,  because  low  frequency  (less  than  1  GHz)  signals  have  much  better 
foliage  penetration  (FOPEN)  and  ground  penetration  (OPEN)  capability,  mak¬ 
ing  them  well-suited  to  detecting  camouflaged  targets  or  buried  targets  (e.g.,  land 
mines).  At  these  low  frequencies,  the  requirement  of  fine  cross-range  resolution 
demands  a  very  long  synthetic  aperture.  In  turn,  this  leads  to  a  very  large  integra¬ 
tion  angle,  which  increases  the  possibility  that  severe  motion  errors  occur  during 
the  SAR  data  collection  process.  As  a  result,  fast  imaging  algorithms  based  on  the 
FFT  are  handicapped  by  non-uniform  spatial  sampling  of  the  collected  data. 


2.  DESCRIPTION  OF  QUADTREE  ALGORITHM 

The  impulse  response  of  the  UWBWA  SAR  data  collection  process  occupies  a 
hyperbolic  contour  in  the  space-time  domain,  because  the  energy  which  was  origi¬ 
nally  concentrated  in  a  single  point  target  in  the  space  domain  (x,  z)  spreads  out 
over  a  hyperbola  in  the  space- time  domain  (x,  t).  For  each  point  (x,  z)  in  the  image, 
the  delay-sum  backprojection  algorithm  coherently  sums  the  collected  SAR  data 

^  Prepared  through  collaborative  participation  in  the  Advanced  Sensors  Consortium  sponsored  by  the  U.S.  Army 
Research  Laboratory  under  Cooperative  Agreement  DAALOl-96-2-0001. 
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along  this  hyperbola  in  the  space-time  domain: 

N-l 

(1) 

a— 0 

where  I{x,z)  is  the  output  image,  D{a,t)  is  the  radar  data  recorded  at  the 
aperture  point,  and  is  the  time  delay  corresponding  to  the  distance  between 
the  image  position  {x,z)  and  the  aperture  point.  When  the  computed  time 
delay  lies  between  signal  samples,  a  bilinear  interpolation  is  done  to  get  the  value 
for  the  summation  (1).  Backprojection  is  like  a  matched  filter,  but  it  is  a  space- 
variant  computation  since  the  shape  of  the  hyperbola  changes  with  range. 

There  are  two  important  advantages  of  the  the  delay-sum  backprojection  algo¬ 
rithm:  (1)  simple  motion  compensation,  and  (2)  localized  processing  artifacts.  The 
simplicity  of  the  computation  which  relies  on  a  distance  calculation  means  that 
there  is  no  requirement  to  have  a  regularly  spaced  array,  as  required  in  FFT-based 
algorithms.  However,  there  might  be  a  need  to  change  the  relative  weighting  when 
doing  the  coherent  summation  to  avoid  “hot  spots”  in  the  final  image. 

The  most  notable  disadvantage  of  backprojection  is  its  computational  complexity 
(order  N^,  for  x  W  pixels,  N  sensors)  as  opposed  to  FFT-based  algorithms  such 
as  the  u>-k  method  (order  N'^logN). 

2.1.  Quadtree  Algorithm 

A  Quadtree  is  a  general  way  of  representing  an  algorithm,  or  data  structure, 
in  a  hierarchical  tree  structure.  The  Quadtree  for  UWBWA  SAR  Processing  was 
first  introduced  in  [1]  where  a  divide  and  conquer  decomposition  of  the  Delay-sum 
Backprojector  was  described.  The  radix-2  Quadtree  algorithm  has  two  features: 

•  Instead  of  coherently  adding  the  data  from  all  sensors,  the  summation  is  done 
two  sensors  at  a  time.  As  a  result,  we  must  form  “virtual  sensors”  from  two 
neighboring  sensors  at  every  iteration  stage. 

•  Instead  of  computing  the  exact  values  of  the  image  at  all  points  in  the  ground 
patch  at  full  resolution,  use  a  multi-resolution  scheme  to  approximate  the  final 
image.  In  this  case,  a  ground  patch  is  divided  iteratively  into  a  2x2  set  of  subpatches 
at  each  stage. 

At  each  iteration  stage,  the  generation  of  new  aperture  points  and  new  sub-images 
(sub-patches)  can  be  described  as  a  “parent-child”  structure  that  is  typical  of  tree- 
structmed  recursive  algorithms  (Fig.  1). 

2.2.  Interpolation  and  Beamforming 

The  core  computation  of  the  Quadtree  image  former  is  a  two-sensor  delay-and- 
sum  beamformer,  where  the  delay  is  recalculated  at  each  iteration  based  on  the 
distance  between  the  new  ground-patch  centers  and  either  the  parent  or  child  aper¬ 
ture  positions.  In  the  parent  nodes,  the  radar  data  is  aligned  for  the  distance 
between  the  parent  aperture  positions  and  the  parent  ground  patch  center.  When 
the  beamforming  is  done,  its  output  data  must  be  aligned  for  the  distance  between 
the  child  aperture  positions  and  the  child  ground-patch  centers.  These  distance 
calculations  are  time-consuming  because  they  involve  square  roots  which  then  gen¬ 
erate  indices  into  the  parent  data.  More  than  half  of  the  total  operations  in  the 
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PIG.  1.  Tree  structure  of  the  quad-tree  algorithm. 


Quadtree  are  involved  in  calculating  indices.  In  general,  several  parent  data  val¬ 
ues  are  selected  by  the  indexing,  and  bilinear  interpolation  is  done  as  part  of  the 
two-point  beamforming. 


3.  OPERATION  COUNT 

The  original  Quadtree  [1]  was  a  radix-2  algorithm,  but  we  have  recently  extended 
the  algorithm  to  a  mixed-radix  form  [2].  In  this  section,  we  summarize  the  number 
of  operations  (x,  ±  ,  required  in  the  mixed-radix  Quadtree  algorithm.  We 
make  the  following  definitions: 

L  =  number  of  apertures  combined  at  each  stage 
K  X  K  array  of  new  image  centers  generated  at  each  stage 
Npe  =  number  of  Parent  apertures  at  the  Ath  iteration 
Nee  =  number  of  Child  apertures  at  the  Ath  iteration 
Ntce  =  number  of  Time  (range)  samples  in  a  child  node 
Mpi  =  number  of  Parent  ground  patches  at  the  Ath  iteration 
Mee  =  number  Child  ground  patches  at  the  Ath  iteration 

We  assume  that  L  =  K,  so  the  aperture  dividing  factor  equals  the  image  patch 
expansion.  Finally,  we  assume  that  we  start  with  an  equal  number  of  apertures 
and  image  patch  samples  N  =  K^.  The  relationships  between  these  numbers  is: 

=  Mci  =  MpeK  =  Ntce  «  Kp-^ 

The  following  four  steps  are  analyzed: 

1.  Distance  from  each  child  aperture  to  the  child  ground-patch  centers  requires 
multiply,  add  and  square  roots  to  compute  Nc  distances  at  each  stage: 


(=1  £=1 


le 


K 

K-l 


{K^P  -  RP) 


R‘2p+l 

K-l 
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2.  The  distance  from  Parent  apertures  to  Child  range  bins  which  is  done  NteNpM^ 
times  at  each  stage; 

V  N.aNaMiK  =  K 

e=i  ^=1 

3.  Interpolation  which  only  needs  multiplications  and  additions  and  is  done  NtcNpM^ 
times  at  each  stage: 

r=i 

4.  Coherent  Summation  which  is  done  Nce{K  -  1)  times  at  each  stage; 


j^N^[K  -  1)  =  Y.{K  -  IW-'  = 

^  r=i 


Thus,  the  total  number  of  computations  is: 

+  2pK2j>+i  +  kJ’  -  1  «  N\-J^  +  2K  log^  N)  {p  =  logK  N) 

K-1 


4.  MULTIRESOLUTION  ENERGY  MEASURE 

The  internal  data  structure  of  the  Quadtree  algorithm  can  be  written  as  u,  [r,  a,  m,  n] 
where  s  is  the  stage  number,  r  is  the  range  bin  number,  a  is  the  aperture  point 
index  and  (m,n)  are  the  coordinates  of  the  ground  patch  center  position.  In  the 
Quadtree,  the  raw  data  is  a  function  of  only  (r,a),  but  is  gradually  localized  into 
the  (x  z)  coordinates  as  the  iterations  progress.  The  (r,  a)  dimensions  contract 
and  the  (m,  n)  dimensions  expand  at  each  stage,  representing  finer  sampling  of  the 

image  patch.  .  , 

A  multiresolution  measurement  scheme  can  be  constructed  by  summing 
squared  data  within  each  subimage  patch  to  obtain  an  Energy  Distribution  Function 

defined  versus  {m,n): 

Es  [m,  n]  =  53  1^* [’'>  o.,m,n]\^ 

r  a 

The  s-th  stage  has  2^-^  image  centers,  so  the  domain  for  (m,  n)  grows  with  s. 
Figure  2  shows  a  set  of  images  of  Eg  (m,  n)  at  each  stage  of  a  radix-2  Qua  ee.  n 
this  figure  we  can  see  large  signal  regions  in  the  early  stages  of  the  image  ecome 
focused  in  the  later  stages.  As  the  number  of  subimage  centers  grows,  the  regions 
with  higher  energy  shows  the  locations  where  targets  are  likely  to  be  found.  A  new 
target  detection  algorithm  has  been  based  on  this  focusing  sequence  [5]. 

5.  HARDWARE  IMPLEMENTATION 

In  surveillance  applications  the  UWBWA  SAR  might  be  fielded  in  an  unnianned 
airborne  platform,  so  the  constraints  of  power  and  size  will  drive  the  implemen¬ 
tation.  Our  approach  is  to  explore  the  use  of  a  Context  Switching  Reconfigurab  e 
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FIG.  2.  Energy  Measure  for  all  levels  of  the  Quadtree  (first  stage  is  the  raw  data). 

Computer  (CSRC)  being  developed  by  Lockheed-Sanders  under  the  DARPA  Adap¬ 
tive  Computing  Systems  program. 

A  CSRC  has  multiple-layers  of  logic,  called  contexts,  which  can  be  switched 
on  a  single  clock  cycle.  Each  context  implements  a  particular  function  (multiply- 
add,  DSP  operation,  I/O,  etc.).  As  an  algorithm  executes,  it  switches  context 
to  accelerate  a  given  function.  The  device  allows  for  data  to  be  shared  between 
contexts  and  also  has  RAM  to  implement  local  data  storage.  We  are  developing  only 
a  single  node  to  benchmark  the  acceleration  of  the  distance  and  index  calculations 
for  the  inner  loops  of  the  Quadtree, 

5.1.  Jump  Level 

Our  approach  to  accelerating  the  Quadtree  is  run  several  stages  of  the  algorithm 
and  then  to  switch  to  another  algorithm  like  oj-k  or  backprojection.  If  we  do  most 
of  the  Quadtree  iterations  then  the  FFTs  for  the  uj-k  algorithm  should  be  small 
enough  to  be  run  on  individual  processors. 

It  is  important  to  realize  that  the  Quadtree  algorithm  produces  data  in  aperture 
and  range,  not  in  image  coordinates.  If  a  final  high-resolution  image  is  desired  there 
are  two  choices.  The  Quadtree  algorithm  can  be  carried  out  to  the  trivial  case  of 
one  aperture,  one  pixel  per  image  section,  and  one  range  cell  per  image  section. 

As  a  second  option  the  aperture-range  data  in  each  subimage  can  be  focused  using 
either  delay-sum  or  w-fc  backprojection.  The  number  of  iterations  of  the  Quadtree 
algorithm  employed  before  final  focusing  is  referred  to  as  the  jump  level.  In  the 
hardware  effort  by  Georgia  Tech,  Sanders,  and  ARL  to  create  a  real-time  UWB  SAR 
imager,  delay-sum  backprojection  is  used  to  create  a  final  high-resolution  image 
from  Quadtree  data.  This  technique  has  proven  to  be  faster  and  more  accurate 
than  the  Quadtree  algorithm  alone  [3]. 

The  squared  distance  calculations  in  the  Quadtree  algorithm  can  be  formulated 
as  two  real  adds  per  range  cell,  with  significant  overhead  per  subimage-aperture 
combination.  This  is  very  effective  for  the  larger  subimages,  but  the  overhead 
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Computation  time  vs.  stages  of  quadtree  employed 
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Stages  of  quadtree  prior  to  delay-sum  focusing 
FIG.  3.  Computation  time  as  a  function  of  the  number  of  quadtree  levels  used  before  jumping 
to  a  backprojector  to  finish.  In  this  case,  the  computation  starts  with  an  array  of  2048  apertures, 
spaced  by  11.25  cm.  The  image  size  is  1536  (downrange)  x  512  (crossrange).  Downrange  x 
crossrange  resolution  is  3.75  cm  x  11.25  cm. 


becomes  burdensome  for  the  smaller  subimages  in  the  final  few  Quadtree  stages. 
Using  this  formulation  of  the  Quadtree  algorithm,  and  a  delay-sum  backprojector 
specifically  coded  for  fast  imaging  of  Quadtree  data,  computation  times  were  com¬ 
pared  for  different  jump  levels  from  the  Quadtree  to  the  delay-sum  backprojector. 
One  result,  in  this  case  showing  that  the  Quadtree  should  be  used  to  reduce  a  2048 
sensor  array  to  no  fewer  than  32  apertures,  is  shown  in  Fig.  3. 

Each  subimage  can  also  be  focused  at  later  stages  using  the  w-fc  backprojector, 
but  the  w-k  algorithm  requires  sensors  to  be  evenly  spaced  along  a  linear  array.  If 
the  sensors  are  necessarily  spaced  along  a  wandering  path,  as  in  airborne  synthetic 
arrays,  the  Quadtree  can  be  used  to  synthesize  an  evenly  spaced  virtual  array. 
The  optimal  approach  to  this  depends  on  the  characteristics  of  the  flight  path, 
but  results  show  that  several  stages  of  the  Quadtree  algorithm  should  be  used 
to  gradually  approach  a  linear  array.  Then  the  uj-k  backprojector  is  expected  to 
provide  the  best  solution  to  high-resolution  focusing  of  Quadtree  data. 
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An  objective  function  that  measures  the  deviation  from  smoothness  for  a 
space-time  preprocessor  cuiti-jam  filter  is  developed.  A  hnecir  combination 
of  noise  eigenvectors  is  formed  to  produce  the  desired  space-time  weights  for 
minimizing  GPS  signed  distortion  based  on  minimization  of  the  objective 
function.  It  is  demonstrated  that  the  smoothness  of  the  spectrum  chcir- 
acterizing  the  space-time  preprocessor  across  angle  and  frequency  depends 
on  the  choice  of  the  space-time  delay  in  the  objective  fimction. 


Key  Words:  GPS;  anti-jam  filter;  preprocessor;  power  minimization;  smoothing 


0.  INTRODUCTION 

GPS  is  known  to  provide  significant  force  enhancement  capability.  This  force 
enhancement  capability  has  been  demonstrated  in  every  U.S.  military  operation 
since  (and  including)  the  Gulf  War,  but  with  this  capability  is  a  concern  about 
the  vulnerability  of  the  GPS  signal  to  jamming.  The  jamming  threat  is  serious 
because  of  the  physical  design  of  the  GPS  system.  The  received  power  from  the 
GPS  satellites  is  approximately  -157  dBW.  Many  jammers  available  on  the  arms 
market  today  either  already  cover  the  GPS  frequencies,  or  can  be  modified  to  do 
so.  A  space-time  preprocessing  filter  prior  to  the  GPS  correlators  is  one  of  several 
proposed  methods  for  suppressing  jammers.  However,  this  type  of  filter  also  induces 
some  distortion  of  the  desired  GPS  signal. 

It  is  known  that  tapped-delay  line  preprocessing  of  a  spread  spectrum  signal 
introduces  distortion  of  the  desired  GPS  signal.  Characterization  of  this  type  of 
distortion  in  time-only  preprocessing  has  been  previously  studied  in  [2].  We  here 
consider  the  space-time  extension  of  such  an  interference  suppression  algorithm 
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proposed  in  [4]  to  eflFectively  null  both  wideband  and  narrowband  jammers  while 
minimizing  GPS  signal  distortion. 

1.  POWER  MINIMIZATION  BASED  JOINT  SPACE-TIME 
PREPROCESSOR 

In  the  joint  processing  approach,  each  sample  value  input  to  the  GPS  receiver 
is  formed  from  a  linear  combination  of  samples  across  both  space  and  time.  The 
space-time  weights  are  realized  through  a  tapped-delay  line  behind  each  digitized 
baseband  antenna,  as  shown  in  Figure  1.  The  output  of  the  preprocessor  is  then 
fed  to  a  standard  digital  GPS  receiver.  The  goal  of  the  preprocessor  is  to  suppress 
jammers  as  best  as  possible  while  simultaneously  passing  as  many  undistorted  GPS 
signals  as  possible.  Note  that  the  anti-jam  space-time  filter  will  not  be  optimized 
for  any  one  GPS  satellite  signal  in  terms  of  maximizing  the  SINK.  The  advantage  of 
this  approach  is  that  the  anti-jam  space- time  filter  remains  a  separate  component 
so  that  a  standard  digital  GPS  receiver  may  be  employed. 

The  criterion  for  determining  the  optimal  set  of  space-time  weights  is  premised  on 
the  fact  that  the  respective  power  levels  of  the  desired  GPS  signals  are  significantly 
below  the  noise  floor,  as  well  as  below  the  respective  power  levels  of  the  potential 
jammers.  The  goal  then  is  to  drive  the  power  of  the  preprocessor  output  down 
to  the  noise  floor.  This  approach  serves  to  place  point  nulls  at  the  respective 
angle-frequency  coordinates  of  strong  narrowband  interferers  and  spatial  nulls  in 
the  respective  directions  of  broadband  interferers. 

In  order  for  the  GPS  receiver  to  provide  accurate  navigation  information,  it  is 
necessary  to  track  the  signals  from  at  least  four  different  GPS  satellites.  Given  the 
parallax  error  associated  with  GPS  satellites  at  near-horizon  relative  to  the  aircraft, 
it  is  generally  desirable  to  track  the  respective  signals  from  a  larger  number  of  GPS 
satellites,  e.g.,  twelve.  It  is  desired  then  that  the  preprocessor  “pass”  unaltered 
as  many  GPS  signals  as  possible.  Thus,  the  magnitude  of  the  multidimensional 
Fourier  transform  of  the  space-time  weights  should  be  as  flat  (smooth)  as  possible 
in  the  spectrum  as  a  function  of  frequency  and  angular  dimensions.  The  goal  then 
is  achieve  a  desired  smoothness  while  simultaneously  nulling  both  wideband  and 
narrowband  interferers.  This  motivates  the  minimization  of  an  objective  function 
that  measures  the  deviation  from  smoothness. 

1.1.  Theoretical  Development 

A  space-time  power  minimization  based  preprocessor  for  GPS  was  proposed  in 
[4]  for  anti-jam  protection.  The  output  power  of  the  space-time  preprocessor  is 
minimized  under  the  constraint  that  the  value  of  the  first  tap  of  the  tapped-delay- 
line  behind  the  reference  element  be  unity.  This  is  not  necessarily  the  optimum 
constraint  to  reduce  GPS  signal  distortion  while  nulling  out  the  wideband  and 
narrowband  jammers.  An  alternative  approach  is  taken  here  in  which  the  space- 
time  weights  are  expressed  as  a  linear  combination  of  the  noise  eigenvectors  of  the 
space-time  correlation  matrix,  thereby  insuring  the  desired  nulling  of  jammers.  The 
coeflScients  for  linearly  combining  the  noise  eigenvectors  are  determined  as  those 
which  minimize  an  objective  function  that  measures  the  deviation  from  smoothness. 

The  NM  X  NM  space-time  correlation  matrix  is  denoted  K,  where  M  is  the 
number  of  antennas  and  N  is  the  number  of  taps  per  antenna.  K  can  be  expressed 
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as 


K  =  K,  + 

where  aj  is  the  power  of  the  noise  per  tap  per  antenna  assumed  without  loss  of 
generality  to  be  both  temporally  and  spatially  white.  K,  is  the  noise-free  space- 
time  correlation  matrix.  Since  the  GPS  signals  are  at  least  16  dB  below  the  noise 
floor  prior  to  the  correlation  with  any  one  satellite’s  PN  code,  the  contributions 
of  the  GPS  signals  to  K,  are  here  considered  negligible.  We  here  assume  that  the 
number  of  jammers  is  such  that  not  all  degrees  of  freedom  are  consumed  for  jammer 
cancellation  purposes.  In  this  case,  K,  is  not  full  rank  so  that  it  can  be  formed 
from  the  K  <  NM  eigenvectors  of  K  associated  with  the  K  largest  eigenvalues. 

An  NM  X  {NM  -  K)  matrix  E^v  is  formed  from  the  NM  -  K  eigenvectors  of 
K  associated  with  the  smallest  eigenvalue  (of  multiplicity  NM  -  K)  equal  to  the 
noise  power,  <r„^;  these  are  the  noise  eigenvectors. 

Eiv  =  [eif+i,eif+2,-...,eArAf].  (2) 

The  2D  FFT  of  the  y-th  noise  eigenvector  is  expressed  in  terms  of  a  spatial  fre¬ 
quency,  //,  and  an  angular  digital  frequency,  u,  which  are  defined  as  follows.  First, 
for  sake  of  simplicity,  we  here  assume  a  linear  array  of  identical  antennas  equi- 
spaced  by  d  =  A/2  along  a  line,  where  A  is  the  wavelength  associated  with  the  LI 
frequency.  In  this  scenario,  the  spatial  frequency  is  defined  as 

d 

/<  =  27r—  sin  0  =  TT  sin  ^ 

where  ^  is  the  angle-of-arrival  (AOA).  The  angular  digital  frequency  is  defined  as 
w  _  where  A/  is  the  frequency  offset  relative  to  the  LI  frequency  and  is 

the  sampling  rate. 

With  these  definitions,  the  2D  FFT  of  the  y-th  noise  eigenvector  may  be  expressed 


sy(/r,w)  =f^(/i,a;)ey  (3) 

where  f(/i,w)  =  fM(/i)®fAr(w).  Now  fM(/r)  and  fN{u)  are  respectively  defined  as 


(4) 

f/v(w)  =  [l,e-’“,eJ2‘^,.. 

(5) 

Any  of  the  noise  eigenvectors  when  viewed  as  a  space-time  weight  vector  places  a 
null  at  the  angle-frequency  coordinate  of  the  i-th  narrowband  jammer 

=  0  (ej 

and  a  spatial  null  in  the  direction  of  the  ^-th  wideband  jammer 

f^{ne,u;)ej  =  0.  (7) 
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We  desire  to  find  a  linear  combination  of  these  noise  eigenvectors  such  that  the 
2D  FFT  spectrum  is  as  smooth  as  possible.  We  seek  therefore  to  minimize  the 

objective  function 


IT  TT 

J_  f  f  ^  zj*Sj{n,ui)\^dndui 

J-ir  ./-IT  j=K+l 


(8) 


with  respect  to  z*  =  2^+2.  ->  ^nm7 ■  minimize  (8)  let  us  form  a  vector 

s(p,w)  =  [sif+l(/^i^)> 

We  can  now  redefine  our  problem  as  minimizing 

_L.  j”  j  -  z^s(/i,w)pd/idu;.  (10) 

We  now  will  utilize  the  complex  vector  minimization  methods  described  in  [1]. 
Using  the  notation  from  [1],  we  are  solving  for  z  such  that 

V^. r  r  -  z^s(/i, w) dfidul  =0.  (11) 

J  —Tf*/  — TT 

Applying  complex  vector  differentiation  yields 

(f  (12) 

yj-nJ-ir  _ . — 


[NM-K)x(NM-K) 


(NM-K)Xl 


Due  to  the  limits  of  integration,  the  left  hand  side  of  (12)  reduces  to  the  Identity 
matrix  and  (12)  simplifies  to 

(13) 


where  =  S„®Sm  with  Sn  =  [5(«),  ^(”  “  (^  1)))^  n  u  f  U 

s  =  \5Tm),..J{m-{M-l))r  where  <5(n)  is  the  Kronecker  Delta  function^ 

This  implies  that  the  solution  of  (8)  is  based  on  selecting  the  same  component  of 
each  noise  eigenvector  where  the  selected  component  depends  upon  the  values  of 
m  and  n.  Now  (13)  is  used  to  linearly  combine  the  noise  eigenvectors  to  form  the 
space-time  weight  vector  h  where 


h  =  Ejvz*. 


(14) 


Any  space-time  delay  factor  may  be  chosen  for  the  objective  functmm 

The  following  simulations  illustrate  that  choosing  the  space-time  delay  as  m  -  ^ 

and  n  =  ^  maximizes  the  desired  smoothness  of  the  2D  FFT  associated  with  h. 

2.  SIMULATIONS 

Two  scenarios  are  presented  to  illustrate  the  importance  of  choosing  the  best 
space-time  delay.  The  simulations  employ  an  M  =  7  element  equi-spaced  linear  ar¬ 
ray  with  N  =  7  taps  at  each  antenna  as  depicted  in  Figure  1.  Table  1  summarizes 
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the  values  used  in  both  scenarios.  The  narrowband  jammers  have  different  fre¬ 
quency  offsets  relative  to  the  LI  frequency.  Since  we  are  assuming  a  20MHz  receiver 
bandwidth  at  each  antenna,  the  noise  floor  was  determined  to  be  at  ~130t/BIT'  after 
bandpass  filtering  at  each  antenna. 

The  first  scenario  utilizes  a  space-time  delay  of  m  =  0  and  n  =  0  while  the  second 
scenario  utilizes  a  space-time  delay  of  m  =  3  and  n  =  3.  Both  scenarios  deal  with 
maximizing  the  smoothness  associated  with  the  objective  function  to  generate  h. 
Figure  2  illustrates  the  smoothness  of  the  2D  FFT  spectrum  of  h  associated  with  the 
first  scenario  while  Figure  3  illustrates  the  smoothness  associated  with  the  second 
scenario.  Notice  the  contrast  in  smoothness  associated  with  each  simulation.  While 
both  scenarios  provide  the  desired  nulls  for  both  the  narrowband  and  wideband 
jammers,  the  second  scenario  provides  the  smoothest  2D  spectrum  across  space 
and  frequency  of  all  possible  space-time  delays. 


TABLE  1 

Simulation  Paremeters 


Jammer  Type 

SNR 

AOA 

Bcindwidth 

Wideband 

-100  dBW 

O 

O 

CD 

1 

20  MHz 

Jammer  Type  ' 

SNR 

AOA 

Frequency(rel.to  LI) 

Narrowband 

-110  dBW 

1 

o 

o 

-8  MHz 

Narrowbtind 

-100  dBW 

-20® 

-5  MHz 

Narrowbmd 

-110  dBW 

0® 

1  MHz 

Narrowband 

-105  dBW 

o 

O 

5  MHz 

Ncirrowbcind 

-105  dBW 

O 

o 

8  MHz 

3.  CONCLUSION 

An  objective  function  that  measures  smoothness  for  a  space-time  preprocessor 
was  presented.  By  maximizing  the  smoothness  of  the  objective  function,  a  space- 
time  weight  vector  is  formed  from  a  linear  combination  of  noise  eigenvectors.  It  was 
shown  that  choosing  a  specific  space-time  delay  in  the  objective  function  exhibited 
the  desired  smoothness  across  space  and  frequency,  thereby  minimizing  GPS  signal 
distortion. 
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Abstract 

In  this  paper,  we  investigate  the  use  of  Hidden  Markov  Models  for  detection  and  recognition  of  Anti-Tank 
Guided  Missiles  (ATGMs).  ATGMs  produce  thermal  energy  that  can  be  received  by  infrared  sensors.  The  problem 
is  to  recognize  the  ATGM  thermal  signals  in  IR  sensor  data  as  quickly  as  possible.  The  tdgorithm  presented  in 
this  paper  employs  temporal  processing  followed  by  a  Hidden  Markov  Model  classifier.  The  performance  of  the 
approach  is  measured  experimentally  on  a  set  of  synthetically  generated  IR  images  with  embedded  ATGM  thermal 
signatures. 


1  Introduction 

Anti-Tank  Guided  Missiles  (ATGMs)  pose  a  dangerous  threat  to  tank  crews.  ATGM  detection  methods  attempt  to 
provide  warning  to  the  crew  so  that  evasive  action  can  be  taken  or  countermeasures  can  be  launched  to  neutralize 
the  threat.  Working  with  available  on-board  IR  sensors,  these  algorithms  must  be  able  to  detect  ATGM  firings  as 
early  in  the  game  as  possible  to  allow  time  for  appropriate  response.  ATGM  thermal  signatures,  as  measured  using 
high  quality  sensors,  have  a  well-defined  characteristic  that  can  be  used  to  discriminate  threats  from  non-threats 
(detection)  as  well  as  discriminate  one  type  of  ATGM  from  another  (classification). 

Examples  of  ATGM  signals  (intensity  vs.  time)  are  plotted  in  Figure  1.  The  first  one,  corresponding  to  the 
HOT  missile,  has  two  “bright”  peaks  at  the  beginning  of  the  signature.  These  are  a  consequence  of  the  two  stage 
motor  of  this  missile.  The  second  signature  characterizes  the  MIEAN  missile.  If  has  a  single  stage  motor,  which 
results  in  a  profile  with  a  single  peak.  Finally,  the  TOW  missile  is  propelled  by  a  two-stage  motor  with  a  .05  second 
boost  and  a  .1  second  sustain  stage.  ATGM  signatures  as  seen  through  on-board  IR  sensors  have  low  signal  to  noise 
ratios  (SNRs),  typically  resulting  in  either  poor  defection  rates  or  unreasonably  high  false  alarm  rates.  Examples  of 
this  are  shown  in  Figure  2,  which  depicts  the  noisy  signatures  obtained  from  the  IR  sensor  for  HOT,  MIEAN,  and 
TOW  ATGMs.  The  SNRs  make  classification  a  challenging  problem.  In  addition,  noise  in  the  background  can  make 
defection  difficult  as  well.  Figure  3  shows  three  examples  of  time  histories  for  randomly  selected  background  pixels. 
As  can  be  seen,  these  pixels  have  signatures  that  can  easily  be  confused  with  those  of  an  ATGM. 

The  goal  of  an  ATGM  warning  system  is  to  defect  and  classify  a  missile  threat  as  soon  as  possible  so  that 
appropriate  measures  can  be  taken.  Given  that  typical  ATGM  flight  times  are  generally  only  slightly  more  than 
10  seconds,  if  is  desirable  to  perform  defection  in  one  second  or  less.  As  if  turns  out,  the  onboard  IR  sensor  used 
for  defection  has  limited  spatial  resolution.  This  in  conjunction  with  the  missile’s  linear  trajectory  being  pointed 
directly  toward  the  sensor  results  in  the  signature  information  residing  in  a  single  pixel  position.  That  is,  each  pixel 

“This  work  was  sponsored  in  pari  by  US-Army  CECOM  under  coniraci  DAAB07-94-C-M756. 
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Figure  1:  Examples  of  ideal  ATOM  sequences. 


Figure  2:  Examples  of  ATGM  sequences  as  observed  through  on  board  IR  sensors. 

position  (as  a  function  of  time)  represents  either  noise  or  a  potential  ATGM  signature.  Consequently,  we  examine 
the  1-D  time  histories  associated  with  the  pixel  positions  in  an  attempt  to  detect  and  classify  ATGM  signatures. 

Owing  to  constraints  on  computational  complexity,  we  approach  the  detection  problem  in  a  hierarchical  fashion, 
which  allows  the  processing  to  focus  on  the  image  regions  that  most  likely  represent  an  ATGM  threat.  In  the  first 
stage  of  the  approach,  we  employ  a  simple  temporal  processing  routine  to  remove  obvious  noise/ clutter  pixel  positions 
from  further  consideration.  This  first  stage  effectively  converts  the  3-D  IR  data  info  a  set  of  1-D  candidate  signals. 

The  temporal  processing  procedure  consists  of  first  filtering  along  the  temporal  dimension  generally  on  a  per  pixel 
basis.  Each  pixel  in  the  filtered  frame  is  compared  with  a  spatially  weighted  threshold.  If  the  pixel  value  exceeds 
the  threshold,  the  pixel  location  is  marked  as  an  ATGM  candidate  location  and  passed  on  for  addition  testing.  All 
pixel  positions  failing  this  threshold  test  are  removed  from  further  consideration. 

One  can  also  perform  the  temporal  processing  hierarchically  by  dividing  the  image  frames  info  contiguous  sub¬ 
blocks,  computing  the  average  for  the  sub-blocks,  and  then  applying  the  temporal  filtering  to  these  block  averages. 
Temporally  filtered  blocks  whose  pixels  exceed  a  spatially  weighted  threshold  are  then  subdivided  info  smaller  blocks 
with  repeated  application  of  the  temporal  filtering.  Hierarchical  processing  of  this  type  can  be  effective  when  ATGM 
signal  variations  straddle  two  or  more  pixel  positions. 
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Figure  3:  Examples  of  background  noise  signals. 


2  HMMs  for  ATGM  Detection  and  Classification 

With  the  IR  images  now  converted  into  a  set  of  1-D  time  signals,  the  task  at  hand  is  to  determine  if  any  of  these 
signals  constitutes  a  threat  and,  if  so,  determine  the  type  of  threat.  This  is  an  M-class  classification  problem,  where 
M  -  1  of  the  classes  correspond  to  ATGMs,  and  the  remaining  class  constitutes  noise  signatures.  To  address  this 
classification  problem,  we  investigated  the  use  of  Hidden  Markov  models  (HMMs),  as  this  technique  has  proven  to 
be  effective  in  speech  recognition  and  target  recognition  problem  areas  [1,  2,  3].  For  this  application,  we  view  the 
mechanics  associated  with  the  ATGM  engines  as  a  process  that  produces  a  heat  signature.  The  heat  signature  is 
observed  by  the  IR  sensor,  and  is  the  only  observation  available  to  us.  We  assume  that  the  engine  process  can 
be  modeled  as  Markov,  that  is,  represented  reasonably  well  by  a  finite  state  machine  where  at  every  time  instance 
a  transition  is  made  between  states  .  ,q^ ,  and  an  observation  sequence  O  =  Oi,D2,---,Dt,  is  generated 

according  to  a  probability  density  function  associated  with  that  state. 

The  basic  assumption  of  the  HMM  is  that  the  successive  observation  samples  produced  by  the  same  state  are 
assumed  to  be  independent  of  each  other  and  the  time  f,  and  that  the  probability  of  a  state  transition  depends  only 
on  the  previous  state  [4].  Therefore,  the  joint  probability  of  the  state  sequence  q  being  generated  by  the  Markov 
model  and  the  observation  sequence  O  being  generated  by  that  state  sequence  can  be  calculated  by  the  product 

T 

P(0,  q|p.  A,  B)  -  PgO  JJ  Ugt-lg.  bgt  (Dj), 

*=1 

where  pi  =  P(g°  =  i)  is  the  probability  of  the  initial  state,  A  =  [uy]  is  the  transition  probability  matrix,  and  bgt{Di) 
is  the  probability  density  of  the  observation  Di  at  state  9*. 

2.1  Estimation  of  tEe  HMM  parameters 

To  employ  HMMs  for  the  ATGM  classification  problem,  the  parameters  are  chosen  to  optimize  a  specific  criterion, 
given  the  training  data.  The  general  approach  is  to  use  Maximum  Likelihood  Estimation  (MLE),  which  attempts  to 
maximize  the  likelihood  of  the  training  data  given  the  model  of  the  correct  class.  MLE  is  usually  achieved  by  using 
the  Expectation-Maximization  (EM)  algorithm,  an  example  of  which  is  the  Baum- Welch  algorithm  [4]. 

Assume  that  the  quantifies  available  are  the  observations  O,  the  latest  estimates  of  the  unknown  probabilities  in 
the  model  A,  and  an  assumed  inifial-sfafe  distribution  pj,  i  =  1,  •  •  ■ ,  S,.  The  procedure  sets  up  a  reesfimafion  formula 
that  is  at  least  as  good  as  the  previous  initial  estimate.  Baum  first  proved  the  solution  to  this  problem  is  to  maximize 
an  auxiliary  function  that  leads  to  an  increase  in  the  overall  likelihood  of  a  new  set  of  HMM  parameters.  The  auxiliary 
function  then  can  be  expanded  and  separated  info  components  based  on  individual  parameters.  These  components 
lead  to  closed  form  solutions  for  the  reesfimafion  of  the  model  set  parameters.  To  compute  the  likelihood  efficiently. 
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we  use  a  recursion  involving  forward  and  backward  (FB)  probabilities.  The  procedure  underlying  the  corapufation 
of  FB  probabilities  is  as  follows. 

1.  Forward  probability: 

For  t  =  1,  •  ■  • , T;  i  =  1,  •  •  • , 5,  the  forward  j)robabUity,  ai{i),  which  represents  the  probability  of  the  partial 
observation  sequence  Di,  O2,  •  •  •  > and  state  at  time  t,  given  the  model  A,  is  computed  using: 

(a)  Initialization: 


(b)  Induction: 


(c)  Termination: 


ai(i)  =  -pibi{oi), 


^t+iU)  = 


■  N 

.*=1 


N 

p(O|A)  =  ^M0. 

2.  Backward  Probability: 

For  t  =  T,T  —  1,  -  ■■  ,1;  i  =  1,  •  •  • ,  5,  the  backward  probability,  ^t{i)  which  represents  the  probability  of  the 
partial  observation  sequence  Dt+i,Dt+2,  ,Dt  and  state  Sj  at  time  t,  given  the  model  A,  can  be  computed 
by: 

(a)  Initialization: 


^xii)  =  Ij  1  ^  ^ 

(b)  Induction: 

N 

Mi)  =  E U),  t  =  T  -  1, T  -  2,  •  •  • ,  1, 1  5  i  <  IV. 

5=1 

The  reestiraation  formulas  for  the  coefficients  of  the  mixture  density,  i.e.,  Cjm,P’jkj  and  Tjk,  are 

„  ^  ELTtO'.fc) 

ELE".7.(j'.« 

_  _  ELi  7i(j,  fc)  •  Pi 

''  Er=i7*0-,fc) 

r  —  Ei=l  7i(j!  Pjfc)  *  ~  Plk) 

''  ~  Er=i7*(i,fc) 


where  7i(j,  k)  is  the  probability  of  being  in  state  j  at  time  t  with  the  mixture  component  accounting  for  Dt,  i.e., 


7eO'.A) 


'Cjk^{Ul,Jijk^  Fjfc) 

.Eto=:1  '^jk^{i^l>P''}kfi^'jk)  _ 

and  ^  is  a  Gaussian  density,  with  mean  vector  fijm  and  covariance  matrix  Fjm  for  the  mixture  component  in 
state  j.  The  mixture  gains  cjm  satisfy  the  stochastic  constraint 


m=l 

^jm 


1, 


?  0, 


1  ?  5  <  IV, 

1  S  i  ^  IV,  1  ^  <  M, 
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so  that  {he  PDF  is  properly  normalized. 

The  recognition  of  an  unknown  firing  sequence  involves  computing  the  likelihood  of  the  observation  sequence 
given  each  HMM  model.  The  likelihood  computation  takes  the  most  likely  state  sequence  through  the  model  for  the 
final  score.  For  this  we  use  the  Viterbi  algorithm,  its  main  advantage  being  that  it  considers  computation  of  disjoint 
paths  separately. 

3  Experimental  Results  and  Discussions 

To  assess  the  utility  of  HMMs  for  ATGM  signature  detection  and  classification,  a  set  of  synthetic  data  was  generated 
using  software  from  GTRI  [5].  The  imaging  sensor  was  modeled  with  the  GTSENSE  package.  A  3D  representation 
of  the  White  Sands  missile  range  was  used  to  obtain  the  background  imagery,  which  was  rendered  using  GTSCENE. 
The  ATGM  signatures  are  based  on  the  Meppen  ATGM  live  tests  conducted  in  May  1998. 

The  simulated  ATGM  warning  system  was  configured  to  be  mounted  on  a  stationary  platform  and  contained 
four  IR  sensors.  Each  sensor  had  a  field  of  view  of  90  and  14.25  degrees  in  the  horizontal  and  vertical  orientations 
respectively  and  operated  at  a  rate  of  500  frames  per  second.  A  360  degree  field  of  view  was  obtained  by  pointing 
each  sensor  to  a  different  cardinal  direction.  The  focal  plane  array  had  256  x  32  pixels  and  was  obtained  with  a 
uniformly  spaced  sampling  pattern  that  mapped  the  world  onto  a  flat  plane. 

The  experiments  performed  involve  1)  detection  of  the  ATGM  signatures  against  a  set  of  randomly  selected 
background  pixel  signals;  and  2)  a  classification  test  among  3  target  types:  HOT,  MILAN,  and  TOW. 

1.  Detection 

Background  pixel  signals  were  chosen  from  synthetic  sensor  images.  These  signals  were  obtained  by  randomly 
choosing  background  pixels  located  outside  the  neighborhood  of  the  real  signature  from  each  training  image. 
The  set  of  background  signals  is  divided  into  subsets  of  200  training  sequences  and  100  testing  sequences.  Each 
sequence  contains  100  samples.  Experiments  are  performed  using  25  saraples/sequence  obtained  by  decimation 
and  also  by  using  the  full  undecimated  sequences  directly  to  determine  if  complexity  could  be  reduced  without 
loss  in  detection  performance.  Three  simple  features  were  used,  all  of  which  were  derived  by  temporal  filtering 
(with  different  coefficient  values).  Table  1  shows  the  detection  performance  of  the  baseline  HMM  at  25  and 
100  samples  per  sequence  as  a  confusion  matrix  that  tabulates  the  correct  detection  and  false  alarms. 

2.  Classification 

Here  we  investigate  the  capability  of  the  HMM  to  distinguish  among  three  target  types:  HOT,  MILAN,  and 
TOW.  The  training  and  testing  in  this  stage  repeats  the  same  procedure  described  in  the  previous  part.  A  set 
of  training  sequences  was  selected  from  the  available  pool  of  sequences,  and  the  remaining  sequences  were  used 
for  testing.  This  process  was  repeated  200  times  using  different  subsets  of  the  sequence  pool.  A  small  amount 
of  noise  was  added  to  the  set  at  each  run.  An  average  was  taken  over  the  200  runs  to  assess  the  performance. 
Table  2  shows  the  classification  performance  of  the  baseline  HMM  at  25  and  100  samples  per  sequence  as  a 
confusion  matrix  that  tabulates  the  correct  and  incorrect  classifications. 

It  can  be  seen  from  Table  1  and  Table  2  that  HMMs  are  highly  capable  of  both  detecting  and  classifying  the 
ATGM  sequences  and  the  background  noise  signals  when  the  original  sequences  are  subsampled  by  a  factor  of  5  and 
20  (100  and  25  samples  per  sequence,  respectively).  The  detection/classification  of  each  system  is  99%/91.3%  for  a 
100-sample  system,  and  100%/88.7%  for  the  25-sample  system,  that  we  investigated. 

In  summary,  ATGM  detection  and  classification  is  a  relatively  new  problem  that  has  arisen  from  the  recent 
introduction  of  low-cost  anti-tank  weaponry.  The  development  of  an  onboard  early  ATGM  warning  system  is  expected 
to  have  a  significant  impact  on  the  safety  of  tank  crews.  At  this  point  in  time,  no  no  such  system  exist  that  provide 
reliable  detection  and  threat  classification,  operating  under  the  above-mentioned  sensor  quality  limitations  and 
response  time  constraints.  The  work  reported  in  this  paper  represents  a  first  step  toward  providing  a  solution  to  the 
early  ATGM  warning  system  problem  and  might  be  used  as  a  benchmark  for  future  work.  Our  conclusion  is  that 
HMMs  provide  a  promising  approach  for  rapid  threat  detection.  The  next  aspect  of  the  problem  we  plan  to  study  is 
detection  and  discrimination  in  the  presence  of  gun  flashes,  which  can  cause  false  alarms. 
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(b) 

Percent  Classified  as 

Targets 

Unknown 

Targets 

98 

2 

Unknown 

0 

100 

total  percent  detection  =  99  \ 

(a) 

Percent  Classified  as 

Targets 

Unknown 

Targets 

100 

0 

Unknown 

0 _ 

100 

total  percent  detection  =  100 

Table  1:  The  effect  of  the  number  of  the  observation  length  on  the  detection  performance  evaluated  on  the  testing 
data  set:  (n)  T  =  25  samples  and  (b)  T  =  100  samples. 


1  (b) 

Percent  C 

lassified  as 

HOT 

MILAN 

TOW 

HOT 

84 

1 

15 

MILAN 

6 

92 

2 

TOW 

2 

0 

98 

total  percent  recognition  = 

=  91.3 

(a) 

Percent  Classified  as 

HOT 

MILAN 

TOW 

HOT 

77 

2 

21 

MILAN 

5 

90 

5 

TOW 

1 

0 

99 

total  percent  recognition  = 

=  88.7 

Table  2:  The  effect  of  the  number  of  the  observation  length  on  the  classification  performance  evaluated  on  the  testing 
data  set:  (a)  T  =  25  samples  and  (b)  T  =  100  samples. 
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Thomson’s  Multiple-Window  (MW)  method  is  applied  to  the  problem 
of  target  detection  in  HF  skywave  radar.  The  MW  method  makes  minimal 
assumptions  about  the  noise  environment,  is  specifically  designed  for  short 
data  segments  and  has  high  resolution.  These  properties  suggest  that  the 
MW  method  may  be  useful  in  dealing  with  some  of  the  challenges  of  target 
detection  in  skywave  radar.  The  performance  of  a  simple  MW  detector  and 
a  conventional  CFAR  detector  are  compared  using  data  from  the  Jindalee 
skywave  radar. 


1.  INTRODUCTION 

This  paper  investigates  the  application  of  Thomson’s  Multiple  Window  (MW) 
method  to  the  problem  of  target  detection  in  HF  skywave  radar.  The  motivation 
is  to  overcome  performance  limitations  of  conventional  target  detectors  in  dealing 
with  inhomogeneous  noise  environments.  The  performance  of  conventional  and 
MW-based  detectors  are  analysed  in  some  simple  scenarios,  using  both  synthetic 
and  real  data.  Conclusions  are  drawn  as  to  the  feasibility  of  the  MW  detector  and 
areas  for  further  research  are  identified. 

The  paper  is  organised  as  follows.  In  section  2  we  provide  background  to  the 
problem  at  hand.  In  section  3  we  briefly  describe  the  MW  approach  to  detection 
and  illustrate  its  performance  by  comparing  it  with  a  conventioncil  Constant  False 
Alarm  Rate  (CFAR)  detector  using  a  simple  data  model.  In  section  4  we  carry 
out  a  similar  performance  comparison  by  injecting  sysnthetic  targets  into  real  noise 
data  from  the  Jindalee  radar.  Conclusions  are  presented  in  section  5. 

2.  BACKGROUND 

HF  skywave  radars  use  ionospheric  refraction  to  detect  and  track  targets  over  vast 
coverages  and  at  ranges  of  up  to  3000  km.  A  fundamental  design  requirement  for  a 
target  detector  in  skywave  radar  is  that  it  perform  effectively  in  an  inhomogeneous 
noise  environment  which  can  include  radar  system  noise,  surface  clutter  returns, 
meteor  returns,  atmospherics  and  various  forms  of  radio-frequency  interference. 
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Surface  clutter  Target  signal 


FIG.  1.  A  dwell  oj  raw  Jindalee  data,  showing  signal  power  on  a  grey-scale  ARD  display. 
The  left  and  right  hand  edges  oJ  the  display  correspond  to  OHz  Doppler  sh^. 


For  aircraft  surveillance  the  radar  revisit  time  is  typically  of  the  order  of  tens  of 
seconds  and  must  generally  be  assumed  greater  than  the  correlation  time  of  the 
noise.  This  means  that  for  each  revisit  the  target  detector  is  required  to  provide 
CFAR  candidate  detections  to  the  tracking  system,  given  minimal  prior  information 
about  the  noise  environment.  Moreover,  the  number  of  available  data  samples  per 
revisit  is  generally  relatively  small. 

The  Jindalee  skywave  radar  transmits  a  repetitive,  linear  sweep,  frequency- 
modulated  continuous-wave  signal  and  processes  the  received  signal  using  an  ap¬ 
proximate,  3-dimensional  matched  filter  for  azimuth,  range  and  Doppler.  The  pro¬ 
cessing  includes  pre-processing  to  excise  transients  due  to  atmospherics  and  meteor 
returns,  data  windowing  to  control  sidelobe  leakage  and  CFAR  processing  to  es¬ 
timate  local  noise  power  statistics  using  samples  from  a  neighbourhood  of  each 
Azimuth-Range-Doppler  (ARD)  cell  (see  [1]  and  references  therein  for  a  discussion 
of  CFAR  processing  algorithms).  An  example  dwell  of  data,  in  raw  ARD  format,  is 
shown  in  figure  1.  The  advantages  of  this  conventional  approach  are  low  to  moder¬ 
ate  computational  load  and  good  performance  over  a  moderate  range  of  operating 
conditions.  However,  because  CFAR  processing  requires  a  neighbourhood  of  ARD 
space,  there  is  the  potential  for  bias  due  to  the  inhomogeneity  of  the  noise  statis¬ 
tics.  This  can  result  in  a  non-uniform  false  alarm  rate  and  degraded  probability  of 
detection,  both  of  which  can  degrade  tracking  performance. 

Thomson’s  Multiple- Window  (MW)  method[2,  3,  4]  provides  a  relatively  new 
method  for  CFAR  detection  of  harmonic  lines  in  unknown  Gaussian  noise.  Impor¬ 
tantly,  the  MW  method  makes  minimal  assumptions  about  the  noise  environment, 
is  specifically  designed  for  short  data  segments  and  has  high  resolution.  In  essence, 
the  MW  method  uses  a  series  of  orthogonal  data  windows  to  generate  multiple, 
independent  realisations  of  the  noise  process  in  the  frequency  domain,  based  on  a 
single  coherent  radar  dwell.  This  means  that  the  MW  method  can  estimate  local 
noise  statistics  without  recourse  to  neighbouring  ARD  samples,  thereby  avoiding 
the  potential  bias  of  conventional  CFAR  processing. 
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3.  THE  MUETIPLE- WINDOW  METHOD 

We  now  briefly  describe  the  MW  method  of  target  detection,  contrasting  it  with 
conventional  CFAR  processing.  Detailed  expositions  of  the  MW  method  can  be 
found  in  [2,  3,  4].  For  simplicity  we  apply  the  MW  method  in  the  sweep/Doppler 
domain  only,  and  this  is  done  independently  in  each  Azimuth-Range  (AR)  cell.  We 
also  assume  that  there  is  at  most  one  target  per  AR  cell,  though  the  results  are 
valid  for  multiple  targets,  provided  that  the  targets  are  well  separated  in  Doppler. 

For  ideal  targets  the  received  discrete-time  signal  in  any  given  AR  cell,  prior  to 
Doppler  processing,  can  be  modelled  as  a  harmonic  line  in  coloured  noise: 

a:[n]  =  /iexp(j2;r/on)  -I-  z\n[\  n  =  0, . . .  ,  iV  -  1  (1) 

where  the  constants  /o  and  p,  are  the  Doppler  frequency  and  complex  amplitude 
of  the  target,  respectively,  and  N  is  the  number  of  sweeps  within  a  coherent  radar 
dwell  (typically  <  128).  The  noise  z[n]  is  assumed  to  be  a  stationary,  zero-mean, 
complex,  Gaussian  random  process  with  power  spectral  density  5(/).  The  target 
detection  problem  is  defined  in  terms  of  the  binary  hypothesis  test  Ho  :  /i  =  0 
versus  Hi  :  /i  0,  to  which  the  usual  Ne3mian-Pearson  criterion  is  then  applied. 

Doppler  processing  in  skywave  radar  conventionally  employs  data  windowing  to 
avoid  sidelobe  leakage  from  very  strong  surface  clutter  returns.  In  this  paper  we 
use  as  an  example  the  minimum  4-sample  Blackman-Harris  window.  In  contrast, 
the  MW  method  employs  an  orthonormal  set  of  windows  each  of  which 

satisfies  the  N  x  A  Toeplitz  matrix  eigenvalue  problem  [2] 

^  r^i  _  sin{27rW(m-n)} 

- -  (2) 

For  riny  specified  analysis  bandwidth  W  the  multiple  windows  have  a  maximal 
energy  concentration  property  within  the  band  /  €  {—W,W),  which  is  measured 
by  the  eigenvalues  {A^}.  This  allows  the  windows  to  be  arranged  in  descending 
order  of  firactional  energy  concentration,  1  >  Aq  >  Ai  >  •  •  •  >  Xn-i-  Although 
sidelobe  leakage  decreases  with  increasing  W,  this  must  be  traded  against  reduced 
fi"equency  resolution. 

For  any  chosen  W  only  the  first  K  windows  are  retained,  the  remainder  being 
discarded  due  to  poor  sidelobe  performance.  Thomson[3]  suggests  the  choice  K  = 
2NW  —  1  or  if  =  2NW  —  3  to  minimise  sidelobe  leakage.  For  W  =  4/iV  the  first 
multiple  window  has  sidelobe  performance  similar  to  that  of  the  abovementioned 
Blackman-Harris  window,  so  this  value  of  W  will  be  used  for  comparison  purposes 
in  this  section. 

For  a  single,  arbitrary  Doppler  window  io[n]  it  is  straightforward  to  show  using 
(1)  that  the  windowed  Discrete  Fourier  Transform  (DFT)  of  x\n]  at  /o,  denoted  by 
Vwifo),  is  a  complex  Gaussian  random  variable  distributed  as 

yMo)'-mpVG,S{fo)).  (3) 

where  G  is  the  net  processing  gain  of  the  window,  G  =  (J2n=o  ^nd  we  as¬ 

sume,  without  loss  of  generality,  that  all  windows  are  normalised  to  unit  incoherent 
power  gain,  J2n=o  =  1-  It  is  important  to  note  that,  in  deriving  (3),  it  is  also 
assumed  that  S(f)  is  slowly  varying  within  the  analysis  band  |/  —  /o|  <  W. 
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FIG.  2.  Comparison  of  performance  of  the  MW  delecfor  with  convenfional  unwindowed 
(Optimal)  and  windowed  (Blackman-Harris)  detectors  at  various  values  of  Pfa,  in  Gaussian  noise. 
Note  that  For  the  latter  two  detectors  the  noise  power  spectral  density  5(/o)  =  is  assumed 
known  a  priori,  while  for  the  MW  detector  it  is  unknown.  In  all  cases  the  target  SNR  is  defined 
as  Njfij^/er^.  The  MW  detector  uses  N  =  128,  W  =  4/W,  K  =  h. 
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It  follows  from  (3)  that,  provided  /o  is  known,  the  (matched  filter)  estimate 
A  =  ywifo)/VG  of  p  is  obtained  from  a  single  realisation  of  ywifo)-  Also,  if 
S{fo)  =  is  known  then  the  decision  rule  for  the  Neyman-Pearson  detector  is 
expressed  in  terms  of  the  likelihood  ratio 


^(Sw)  — 


lywifoW 

/t2 


t 


(4) 


It  is  straightforward  to  show  that  l(yw)  has  a  non-central  chi-squared  distribution 
with  two  degrees  of  freedom[5],  l{yta)  ~  d),  where  d  is  the  output  target  SNR, 

given  by  d  =  G\p\^ 

In  practice,  however,  a  single  realisation  of  y-wifo)  is  not  sufficient  to  satisfy  the 
Neyman-Pearson  criterion  for  detection  because  S{fo)  is  unknovm.  The  role  of 
conventional  CFAR  processing  is  to  estimate  5(/o)  by  sampling  neighbouring  ARD 
cells.  In  favourable  (ie.  homogeneous)  noise  conditions  the  effective  number  of  in¬ 
dependent  samples  is  large,  (say,  tens  of  samples)  and  the  noise  estimate  is  unbiased 
and  of  very  low  variance.  Achieved  performance  in  this  case  will  therefore  approach 
that  of  the  detector  in  (4),  which  is  shown  in  figure  2  over  a  wide  range  of  Pfa 
values,  with  the  Blackman-Harris  window  and  with  no  window  (“Optimal”).  As 
discussed  in  section  2,  however,  the  performance  of  a  conventional  CFAR  detector 
can  be  significantly  degraded  if  the  noise  is  inhomogeneous. 

In  contrast  with  the  above,  the  orthonormal  multiple  windows  can  be  used  to 
generate  K  approximately  independent  realisations  of  the  windowed  DFT,  denoted 
by  the  vector  y  =  [yoifo)  •  ■  ■  yK-i{fo)]^-  Again,  under  the  assumption  of  slowly 
var}dng  S(f)  in  the  analysis  band  |/  —  /o|  <  W,  it  can  be  shown  that 

y^CN{pV,S{h)I)  (5) 


where  V  =  [\/So  •  •  •  ^/Gk-iV  is  vector  of  net  processing  gains  for  the 
individual  multiple  windows.  Applying  least-squares  regression  analysis[4,  5]  to  (5) 
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one  obtains  the  decision  rule  for  the  Neyman-Pearson  detector, 

{2K-2m^  Ho 
2(y-pV)Jiiy-pV)^, 

which  is  distributed  as  non-central  F, 


(6) 


/(y)~F'(2,2if-2,dMw).  (7) 

where  ^mw  =  "  is  the  output  target  SNR,  p  =  V^y/Gtot  and  Gtot  =  V^V  = 

Y^k=o  is  the  total,  net  processing  gain  of  the  K  multiple  windows.  It  follows 
from  (6)  and  (7)  that  5(/o)  need  not  be  known  to  achieve  any  desired  Pfa,  though 
Pd  is  of  course  strongly  dependent  on  5(/o)  through  (Imw-  The  performance  of 
this  MW  detector  is  included  in  figure  2. 

To  interpret  figure  2,  note  that  there  are  two  key  factors  that  govern  MW  detector 
performance.  First  is  the  total  processing  gain  Gtot  of  the  multiple  windows,  which 
for  the  present  case  is  around  2.5  dB  better  than  the  Blackman-Harris  window, 
independent  of  Pfa-  Second,  and  much  more  significant  here,  is  the  number  of 
degrees  of  freedom,  2K  —  2  =  8  in  (7)  which,  when  expressed  in  terms  of  SNR 
“gain”,  is  strongly  dependent  on  Pfa,  particularly  at  low  to  moderate  numbers  of 
degrees  of  freedom. 

Of  the  three  Pfa  values  in  figure  2,  the  middle  one  (Pfa =0.001)  is  most  typical 
of  the  Jindalee  radar.  For  real  skywave  radar  data,  in  a  region  of  homogeneous 
noise,  one  therefore  expects  the  performance  of  the  MW  detector  to  be  of  the  order 
of  a  few  dB  poorer  than  a  conventional  detector  using  the  Blackman-Harris  window. 
To  confirm  this  a  Monte-Carlo-type  analysis  is  carried  out  in  the  next  section  using 
real  noise  data  from  the  Jindalee  radar. 


4.  ANALYSIS  USING  JINDALEE  NOISE  DATA 

To  more  accurately  tune  the  MW  detector  for  typical  operating  conditions,  syn¬ 
thetic  coloured  noise  was  generated  by  approximating  some  real  Jindalee  data  using 
a  4th  order  AR  process.  Pd  was  then  calculated  by  injecting  synthetic  targets  into 
realisations  of  the  AR  noise,  from  which  the  optimum  values  for  the  MW  detector 
were  estimated  to  be  W  =  5/N,  K  =  6.  The  effect  of  pre-whitening  the  radar  data, 
as  discussed  in  [2,  4]  for  example,  was  also  investigated.  For  this  purpose  a  simple 
high-pass  FIR  bandpass  filter  was  used  to  suppress  surface  clutter  and  was  imple¬ 
mented  using  the  filtfilt  algorithm  from  MATEAB,  so  as  to  avoid  filter  transients. 
It  was  found  that  pre-whitening  generally  improved  detection  performance  and  the 
optimum  values  W  =  4/N,  K  =  7  were  estimated. 

The  performance  of  the  tuned  MW  detectors  was  then  estimated  for  a  single  dwell 
of  Jindalee  data  by  injecting  synthetic  targets,  as  described  in  [6].  The  dwell  used 
was  representative  of  benign  operating  conditions,  similar  to  that  of  figure  1,  except 
that  there  were  no  meteor  returns  present.  The  “conventional  CFAR  detector”  used 
for  comparison  in  this  section  is  one  of  a  number  of  detectors  that  can  be  invoked 
in  the  Jindalee  radar. 

Feilse  alarm  performance  was  first  analysed  as  a  function  of  Doppler  shift,  by 
dividing  Doppler  space  into  a  series  of  bands  and  summing  up  all  detections  in 
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PRASCHIFKA  AND  DOUDLE 


FIG.  3.  Achieved  Ppa  oF  (a)  conventional  CFAR  detector,  (b)  MW  detector  vrithout 
pre-whitening  and  (c)  with  pre-whitening.  The  desired  Pta  is  indicated  by  the  horizontal  line. 
Also,  we  use  N  =  128,  with  W  =  5/JV,  if  =  6  in  Figure  (b)  and  W  =  i/N,  if  =  7  in  Figure  (c). 


the  absence  of  injected  targets.  The  results  of  figure  3  indicate  that  the  false 
zdarm  performance  of  the  conventional  and  MW  detectors  is  generally  reasonable 
for  most  Dopplers  of  interest,  taking  into  account  expected  statistical  fluctuations. 
The  MW  detector  appears  to  produce  the  most  uniform  Pfa,  and  it  is  evident  from 
figures  3(b)  and  (c)  that  pre- whitening  mitigates  sidelobe  leakage  near  the  surface 
clutter  returns. 

To  estimate  Pd  two  subsets  of  the  ARD  dwell  data  were  used,  consisting  of  1250 
and  684  cells  in  the  high  and  low  (near  clutter)  Doppler  regions,  respectively.  The 
results  of  figure  4(a)  show  that,  even  with  pre-whitening,  the  detection  performance 
of  the  MW  detector  is  roughly  3  dB  poorer  than  a  conventional  detector  at  high 
Doppler.  This  is  roughly  consistent  with  the  model  results  of  section  4  and  is 
accounted  for  by  the  larger  number  of  degrees  of  freedom  used  by  the  conventional 
detector.  In  figure  4(b)  it  is  evident  that  the  performance  gap  is  even  greater  near 
surface  clutter.  The  cause  of  this  is  unclear,  but  is  presumably  related  in  some  way 
to  sidelobe  leakage. 


5.  CONCLUSION 

It  is  concluded  that  for  skywave  radar  the  simple  MW  detector  analysed  in  this 
paper  is,  in  general,  unlikely  to  rival  a  detector  using  conventional  CFAR  processing. 
Although  it  may  be  possible  to  demonstrate  superior  performance  for  the  MW 
detector  in  more  complicated  examples  of  Jindalee  noise  data,  it  is  clear  that  the 
smaller  number  of  degrees  of  freedom  used  by  the  MW  detector  put  it  at  an  unfair 
disadvantage  from  the  outset.  A  fairer  approach  would  be  to  generalise  the  MW 
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SNR  (dB) 

FIG.  4.  Competrison  oF  Pd  For  Fhe  MW  deteclor  and  a  conventional  CFAR  detector  in  (a) 
a  high  Doppler,  clutter-Free  region  and  (b)  a  low  Doppler,  near-clutter  region  oF  ARD  space.  A 
typical  value  oF  Pfa  was  used.  The  SNR  axes  For  both  plots  are  the  identical,  with  each  minor 
tick  mark  on  the  corresponding  to  1  dB.  For  the  MW  detector  the  values  W  =  i/N,K  =  7  were 
used  with  pre-whitening,  while  W  =  5/N,K  =  6  were  used  without  pre-whitening. 


technique  to  multi-dimensional  processes  (see  [7],  for  example)  thereby  allowing 
the  MW  detector  to  exploit  extra  degrees  of  freedom  from  the  range  and  azimuth 
dimensions.  However,  in  terms  computational  load  and  ease  of  implementation, 
a  more  practical  approach  may  be  to  construct  a  hybrid  detector  in  which  the 
multiple  Doppler  windows  are  used  simply  to  provide  extra  degrees  of  freedom  to 
a  conventional  CFAR  processor. 
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Abstract 

The  parametric  adaptive  matched  filter  (PSMF)  Tor  space-time  adaptive  processing  is  introduced  via  the  matched  filter  (MF),  multichannel 
linear  prediction,  and  the  multichannel  EDU  decomposition.  Two  alternative  implementations  oF  the  PSMF  are  discussed.  Issues  considered 
include  sample  training  data  size  and  constant  False  alstrm  rate.  Probability  oF  detection  is  estimated  using  simulated  phased  array  radar  data 
For  airborne  surveillance  radar  scenarios,  and  signal-to-  interFerence-plus-noise  ratio  is  estimated  For  airborne  phased  array  radar  measurements. 
For  large  sample  sizes,  the  PSMF  perForms  almost  as  well  as  the  MF;  perFormance  degrades  slightly  For  small  sample  sizes.  In  both  sample 
size  ranges  the  PSMF  is  tolerant  to  targets  present  in  the  training  set. 


I.  Introduction 

This  paper  presents  a  new  model-based  space-time  adaptive  processing  (STAP)  algorithm  for  airborne  surveillance 
phased  array  radars  operating  in  Gaussian  interference.  STAP  is  an  area  of  current  interest  to  the  Air  Force  Research 
Laboratory  (AFRE)  for  programs  such  as  the  Advanced  Airborne  Surveillance  Program  (A ASP),  Multi-Channel  Air¬ 
borne  Radar  Measurement  (MCARM),  and  the  Space  Based  Radar  (SBR).  The  airbome/spaceborne  surveillance  radar 
application  presents  specific  challenges  and  constraints,  but  detection  performance,  computational  load,  and  secondary 
(training)  data  requirements  are  key  issues  in  all  cases.  STAP  for  radar  target  detection  was  proposed  first  by  Brennan 
and  Reed  [2].  The  method  of  [2]  consists  of: 

1.  interference  covariance  matrix  estimation  from  target-free  training  data 

2.  weight  vector  calculation 

3.  test  statistic  formation  and  threshold  comparison. 

The  threshold  exhibits  a  dependence  on  the  true  covariance  matrix.  Consequently,  the  constant  false  alarm  rate  (CFAR) 
property  is  lost.  A  modification  to  attain  CFAR  was  proposed  in  [3].  A  key  result  of  [2]  is  the  rule-of-thumb,  referred  to  as 
’’Brennan’s  rule” ,  for  training  data  support  so  that  3  dB  normalized  signal-to-interference-and-noise  ratio  performance  is 
attained.  Specifically,  the  Brennan  rule  states  that  for  an  array  with  a  7N  element-pulse  (or  spatio-temporal)  product, 
JT  =  27]V  independent,  identically-distributed,  target-free  training  data  vectors  are  needed  to  attain  performance 
corresponding  to  a  3  dB  level  below  optimum.  The  training  data  support  requirement  increases  drastically  as  the  problem 
dimensionality  grows  (increased  J  and/or  N).  Moreover,  the  training  data  support  available  in  practice  is  limited  by  the 
temporal  and  spatial  non-stationarity  of  the  interference.  Also,  system  characteristics,  such  as  fast-scanning  arrays  and 
receiver  bandwidth,  impose  further  restrictions  on  the  amount  of  training  data  that  can  be  collected  effectively. 

Calculation  of  the  weight  vector  in  the  conventional  method  requires  the  inverse  of  the  JN  x  JW  spatio-temporal 
interference  sample  covariance  matrix.  This  operation  has  a  computational  cost  on  the  order  of  G(7^]V^),  which 
grows  exponentially  with  increased  problem  dimensionality.  Thus,  it  is  imperative  to  reduce  the  training  data  and 
computational  requirements  of  STAP  algorithms  for  real-time  applications.  Parametric  (or  model)  -based  methods  offer 
a  high-performance  alternative  to  conventional  joint-domain  architectures  and  their  various  approximations  [4],  as  well 
as  to  reduced-rank  techniques  [5].  For  radar  applications,  parametric-based  methodologies  were  formulated  first  for 
single-channel  systems  [7].  More  recently,  the  method  has  been  extended  to  multichannel  systems  [8],  [9],  as  well  as  to 
multichannel  systems  in  non-Gaussian  clutter  environments  [10].  The  method  of  this  paper  is  based  on  approximating 
the  interference  spectrum  with  an  auto-regressive  (AR)  model  of  low  order.  The  fact  that  a  low-order  AR  model  provides 
an  accurate  representation  of  simulated  and  measured  interference  for  a  variety  of  system  and  scenario  conditions  leads 
to  reduced  computational  requirements.  Furthermore,  the  modeling  fidelity  is  attained  using  a  small  fraction  of  the 
Brennan  rule  trjuning  data  set,  thus  presenting  reduced  secondary  data  requirements.  In  addition,  the  method  offers 
significant  improvement  in  detection  performance  over  the  conventional  adaptive  matched  filter  (AMF)  [3].  Specifically, 
it  is  demonstrated  herein  that  with  a  large  sample  support  the  PAMF  approximates  closely  the  detection  performance 
of  the  optimal  known-covariance  matched  filter  (MF)  [3].  It  is  demonstrated  also  that  the  PAMF  provides  significantly 
improved  detection  performance  over  the  AMF  using  only  a  small  fraction  of  the  secondary  data  required  by  the  AMF. 
Furthermore,  the  PAMF  is  tolerant  to  the  presence  of  targets  in  the  secondary  data,  for  both  small  and  large  secondary 
data  set  sizes. 

Multichannel  parameter  identification  methods  constitute  an  inherent  part  of  the  PAMF.  The  identification  algorithms 
considered  herein  are  the  Strand-Nuttall  (SN)  and  the  least-squares. 
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II.  PHASED  ARRAY  RADAR  DETECTION 

A  side-looking  linear  phased  array  radar  configuration  is  considered  herein,  with  J  array  channels  and  a  coherent 
processing  interval  (CPI)  of  N  pulse  repetition  intervals  (PRPs).  A  binary  hypothesis  test  is  applied  to  the  JN-eleraent 
complex  baseband  array  measurement  vector,  x  G  .  One  CPI  is  the  time  elapsed  during  the  collection  of  returns 
from  K’t’  range  bins,  and  the  data  collected  in  one  CPI  is  referred  to  as  the  data  cube.  The  data  vector  contains 
an  unwanted  disturbance  d  €  with  positive  definite  covariance  matrix  Rd,  and  may  contain  an  additive  desired 
signal  ae  with  fixed  but  unknown  complex  amplitude  ”a”  and  known  signal  steering  vector  e  G  The  disturbance 

consists  of  partially-correlated  clutter  c,  directional  broadband  interference  (jamming)  i,  and  thermal  white  noise  w, 
with  covariance  matrices  Rc,  Ri,  and  R^,  respectively.  It  is  assumed  that  the  disturbance  components  are  additive  and 
pairwise  independent,  and  each  is  a  stationary,  zero-mean,  Gaussian-distributed  process.  Thus,  x  C7V(ae,R(/),  that  is, 
X  satisfies  the  complex  normal  distribution  with  mean  ae  and  covariance  Rd  =  Rc  4-  R?  -t-  Riu-  The  binary  detection 
problem  is  to  select  between  hypotheses  Ho  :  a  =  0  and  Hi  :  a  ^  0,  given  a  single  realization  of  x.  For  each  pulse  in  the 
CPI,  the  array  output  sequence  is  processed  to  generate  a  scalar  detection  test  statistic,  which  is  compared  to  a  threshold. 
If  the  test  statistic  exceeds  the  threshold.  Hi  is  declared.  Otherwise,  Ho  is  selected.  In  practice,  Rd  is  unknown,  and 
nrast  be  estimated  from  data  considered  to  be  ’’signal  free”.  This  constitutes  the  adaptive  detection  problem.  In 
the  conventional  approach,  Rd  is  replaced  with  its  maximum  likelihood  estimate  obtained  from  H  <  TCt  independent 
"secondary”  or  ’’training”  data  vectors  d*,  fc  =  1,2,...,H,  with  dk  VHiO,Jld).  For  these  conditions,  the  maximum 
likelihood  estimator  is  the  sample  matrix  estimator,  Kd  =  ^  E^i  Adaptive  detection  in  radar  systems  of  the 

type  considered  herein  is  accomplished  by  a  ’’moving  window”  processing  approach,  wherein  the  detection  test  is  applied 
to  each  range  bin  in  the  data  cube.  The  range  bin  selected  for  testing  at  a  particular  instant  is  referred  to  as  the  ’’primary 
data”.  The  filter  applied  to  the  primary  data  is  generated  adaptively,  utilizing  the  secondary  data  to  design  the  filter 
and  to  extract  information  relevant  to  the  determination  of  the  threshold.  In  this  paper  a  detection  method  is  presented 
that  uses  prediction  error  filtering  (PEF),  in  which  the  filter  coefficients  contain  the  disturbance  correlation  information 
in  compact  form.  These  filters  use  a  time  series  representation  of  the  data.  Thus,  it  is  convenient  to  introduce  the 
sequence  representation  of  the  data  as  x(7i),  n  =  0, 1, . . .  N  —  1,  with  x(n)  G  ,  and  x  =  [x^(0) . . .  x^(N  —  l)]’^. 

Defining  corresponding  sequences  for  the  disturbance  process  and  its  components,  the  detection  problem  is  re-stated 
as: 

Ho:  jc(n)  =  d{n)  n  =  Q,l,...N  —  1  q.. 

Hi  :  x(ti)  =  ae(n)  +  d(n)  n  =  0, 1, . . .  N  —  1. 

The  target  steering  vector  sequence  e(n)  is  of  the  form  e(n)  =  n  =  0, 1, . .  .N  —  1  where  fu  and  fh  are 

the  target  normalized  Doppler  and  spatial  frequencies,  respectively,  and  e(n)  G  is  the  target  spatial  steering  vector, 
defined  as  e(n)  =  ^[1 

Thus,  the  concatenated  (block)  target  steering  vector  e  is  of  the  form  e  =  [e^(0) . . .  e^(N  —  1)]^.  A  similar  definition 
holds  for  other  sequences.  Block  vector  and  vector  sequence  representations  are  used  interchangeably  throughout  this 
paper. 


III.  MATCHED  FILTER  CONFIGURATIONS 

For  known  Rj  and  unknown  signal  amplitude,  a  CFAR  test  statistic  was  proposed  in  [3].  This  test  statistic  fakes  the 
form 


A-mf  = 


(2) 


where  MF  denotes  matched  filter.  This  test  is  a  normalized  version  of  that  proposed  in  [2].  For  the  AMF  test  statistic, 
the  unknown  Rd  is  replaced  by  its  maxinmm  likelihood  estimate  Rd,  so  that 


A-amf  = 


le^Rd"’ 


e^R: 


(3) 


and  is  referred  to  as  the  CFAR  AMF.The  MF  and  AMF  defection  statistics  admit  various  interpretations,  each  providing 
unique  insight.  One  interpretation  is  derived  from  the  matrix  square-root  factorization  of  the  sample  block  covariance 
matrix  and  provides  the  reason  for  the  matched  filter  name.  A  second  interpretation  is  based  on  the  relation  that  exists 
between  multichannel  linear  prediction  and  the  multichannel  (or  block)  EDU  decomposition,  and  provides  the  insight 

for  the  PAMF,  which  is  the  main  motivation  for  this  development.  That  groundwork  is  laid  out  next  in  the  context  of 

—  i  — i 

the  MF,  but  the  approach  is  valid  also  for  the  AMF.  Let  s  =  ^  e  and  v  =  R^  ^  x.  The  MF  test  statistic  is  expressed 

^  Is^vp 

Amf  =  ^  (i) 


s^s 
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Notice  that  under  the  null  hypothesis  v  has  identity  covariance.  Thus,  the  transformation  Rj  ^  is  a  block  whitening 
filter  for  the  disturbance.  Under  the  alternative  hypothesis,  however,  the  target  component  in  x  is  rotated  and  scaled  by 
the  whitening  transformation.  This  requires  that  the  whitening  transformation  be  applied  also  to  the  detector  steering 
vector,  e,  so  that  the  rotated  and  scaled  steering  vector,  s,  matches  the  rotated  and  scaled  target  component  in  x. 
The  numerator  of  Amf  is  a  matched  filter.  The  normalization  provided  by  the  denominator  term  results  in  CFAR 
performance.  An  alternate  interpretation  of  Amf  is  obtained  by  utilizing  the  block  EDU  decomposition  of  Rd,  which  is 
of  the  form  Rd  =  ADA^.  In  this  relation,  A  €  jg  lower  block-triangular  matrix  with  JxJ  identity  matrices 

along  the  main  block  diagonal,  and  D  €  ^JNxJN  jg  g  block-diagonal  matrix  with  Hermitian  matrices  Dj  € 
f  =  0, 1, . . . ,  —  1  along  the  main  block  diagonal.  The  matrix  A~^  is  of  the  form 


Ij 

[O] 

...  [O] 

Af(l) 

Ij 

[O]  ...  [O] 

A2^(1) 

Af(2) 

Ij  ... 

(5) 

A^_i(l) 

A^_i(2) 

’’’  Ij 

where  Ij  is  the  i/  x  7  identity  matrix,  and  the  block  element  matrices  A  e  have  no  specific  structure.  These 

matrices  are  defined  with  the  Hermitian  operator  in  order  to  allow  for  consistent  notation  between  various  model 
identification  algorithms.  The  rows  of  A“^  denote  the  matrix  coefficients  of  the  nth-order  multichannel  linear  predictor 
for  the  process  x(n).  The  corresponding  matrix  Dn  is  the  covariance  matrix  of  the  nth-order  prediction  error  vector. 
In  terms  of  the  block  EDU  decomposition  eind  applying  the  square-root  factorization  of  D,  the  inverse  of  the  block 
covariance  matrix  can  be  represented  as  R^^  =  A“^D"^A"^.  Using  this  in  eq  (2)  gives 


^  |u^D-ie|2 

Amf  =  — . — 


(6) 


where  the  block  vectors  u  =  A  and  e  =  A  ^x  have  been  defined  implicitly.  The  block  vectors  e  and  f  contain  the 
multichannel  element  vectors  e(n)  and  v{n),  and  these  are  given  by 


«(”)  =  ELo  {kMn  -  fc)  n  =  0, 1, . . .  ,7V  -  1 

F{n)  =  Dn^  n  =  0, 1,...,7V-  1 

respectively,  with  for  all  n.  This  implies  that  e(n)  is  the  output  of  an  n*'‘-order  moving-average  (MA)  filter  in  block  mode, 
with  input  the  block  vector  x.  Such  a  filter  is  denoted  as  MA(n).  Since  these  filter  coefficients  are  the  linear  prediction 
coefficients,  the  vectors  in  the  sequence  {€(n)|n  =  0, 1, . . . ,7V  -  1}  are  uncorrelated  in  pairs  (given  the  minimization 
criterion  associated  with  linear  prediction).  This  step  is  equivalent  to  temporal  whitening  of  the  input  data  sequence, 
{x(n)|n  =  0, 1,...,7V  —  1}.  Furthermore,  e{n)  is  the  n**  prediction  error  vector  with  covariance  matrix  D„.  Thus, 
for  all  n,  the  covariance  matrix  of  p(n)  is  the  identity  matrix,  Ij.  Since  the  transformation  generates  uncorrelated 
elements  along  the  spatial  dimension  at  each  time  instant,  this  step  is  a  spatial  whitening  transformation  (or  spatial 
block  whitening  filter).  Thus,  the  vector  sequence  {F(n)|7i  =  0, 1, ... ,7V  —  1}  is  temporally  and  spatially  uncorrelated. 
Similar  expressions  are  obtained  for  the  rotated  block  steering  vectors  u  and  s.  Thus,  the  MF  statistic  can  be  viewed  as 
the  magnitude  squared  (power)  of  the  inner  product  between  the  two  block  vectors  v  and  s,  where  f  is  a  concatenation 
of  the  filtered  data  sequence,  and  s  is  a  concatenation  of  the  filtered  detector  steering  sequence.  This  quantity  is  then 
normalized  by  the  inner  product  of  the  filtered  detector  steering  sequence,  as  shown  above.  For  the  adaptive  case,  the 
matrix  coefficients  are  replaced  with  their  estimates. 


IV.  PARAMETRIC  MATCHED  FILTER 

The  above  discussion  suggests  an  approximation  to  the  MF  with  a  simplified  structure.  First,  for  both  residuals,  e 
and  F  in  (1)  and  (2),  respectively,  retain  only  the  vector  sequence  for  the  filter  of  order  P  where  IKT  KN  —  1.  Second, 
let  the  MA  filtering  step  be  a  moving  window  rather  than  a  block  window.  These  two  modifications  imply  that  the 
temporal  and  spatial  filters  have  a  sequential  form; 

,  P 

F(n)  =  Dp"  V  A^(A:)x(n  -k  +  T)  n  =  0, 1, ...,  TV,  -  1  (8) 

k=0 

where,  Dp  is  the  covariance  matrix  of  the  P^^-order  predictor  error  (P^^-order  filtering  path),  and  TV,  =  TV  -  P.  The 
above-defined  filter  outputs  have  the  same  symbols  as  the  corresponding  variables  in  the  MF,  but  this  is  adopted  to 
limit  the  number  of  symbols  introduced.  The  intended  case  should  be  clear  from  the  context.  In  the  present  context, 
{e(n)|n  =  0, 1, . . . ,  TV,  —  1}  is  the  output  sequence  of  a  MA(P)  filter  in  moving-  window  mode  and  with  input  the  data 
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Fig.  1.  Deleclion  Srchiiecture  of  PMF 


sequence  {x(7i)|n  =  0, 1, ...  ,7V  -  1}.  If  the  matrix  coefficients  {A"(fc)|fc  =  0, 1, . . . ,  P}  and  Dp  are  determined  as  the 
P*'‘-order  linear  prediction  coefficients  and  the  predictor  error  covariance  matrix,  respectively,  and  if  a  system  of  order 
P  is  an  appropriate  model  for  the  disturbance,  then  the  MA  filter  output  sequence  vectors  are  uncorrelated  in  pairs  and 
Dp  is  the  covariance  matrix  of  e{n)  i.e.,  under  the  stated  conditions,  the  MA(P)  filter  is  a  whitening  filter.  In  addition, 
vector  sequence  {v{’n)\n  =  0, 1,  ...,3Vj  —  1}  is  also  temporally  uncorrelated,  and  each  sequence  element  has  covariance 
matrix  Ij.  Lastly,  the  block  vector  v  has  J(N-P)  elements.  In  general,  one  or  more  of  the  conditions  for  whitening  is 
not  met,  and  consequently  the  filter  output  sequence  has  residual  color  (non-white).  Thus,  e(n)  and  v{v)  are  referred 
to  hereafter  as  the  temporal  and  spatio-  temporal  residual  sequences,  respectively.  As  a  result  of  the  two  modifications 
to  the  MF,  the  steering  sequence  is  filtered  analogously.  That  is,  the  temporal  and  spatio-temporal  steering  residual 
sequences  are  given  by 

p 

s(n)  =  Vp^  ^  A^(A:)e(n  —  A:  -I-  P)  n  =  1, 1, . . . ,  7V„  —  1  (9) 

fc=0 

where  7V„  =  TVj  =  TV  —  P,  and  the  block  vector  s  has  J(N  —  P)  elements.  Based  on  the  above  discussion,  the  parametric 
MF  (PMF)  detection  statistic  is  defined  as 


ApMF  = 


(10) 


An  architecture  for  the  PMF  is  shown  in  Fig.  1.  Since  the  PMF  is  an  approximation  to  the  MF,  it  is  likely  that 
it  lacks  CFAR  performance.  Although  difficult  to  prove  analytically,  simulation-based  analyses  indicate  that  the  PMF 
offers  CFAR-like  behavior  for  a  variety  of  cases. 


V.  PARAMETRIC  ADAPTIVE  MATCHED  FILTER 

V7hen  the  filter  parameters  are  unknown,  they  must  be  estimated  adaptively,  and  multichannel  linear  prediction  can  be 
applied  to  estimate  parameters  Dp  and  A^(k)  of  the  MA(P)  filter  using  the  secondary  data.  Furthermore,  model  types 
such  as  time  series  (besides  the  AR)  and  state  variables  can  be  used  in  the  context  of  this  method  [11].  Additionally, 
for  each  model  type  there  are  alternative  parameter  identification  algorithms,  and  implementation  structures  (such  as 
tapped  delay  lines  or  lattice  filters  for  time  series).  This  range  of  options  provides  a  general  form  to  the  parametric 
adaptive  MF  (PAMF)  detection  statistic,  denoted  as  Apamf-  The  form  of  the  Apamf  is  as  in  eq  (10),  but  the  quantities 
are  generated  n-sing  estimated  parameters  obtained  via  an  appropriate  model  identification  algorithm.  Thus,  the  PAMF 
is  formulated  as  a  data-adaptive  version  of  the  PMF,  further  generalized  to  include  a  wide  variety  of  whitening  filter 
types.  An  architecture  for  the  PAMF  is  presented  in  Fig.  2.  As  for  the  previous  case,  all  variables  in  the  PAMF 
equations  and  in  Fig.  2  are  distinct  to  those  of  other  detection  statistics,  but  the  same  symbols  are  used  for  notational 
simplicity.  The  PAMF  lacks  the  CFAR  property  because  the  estimation  error  in  the  filter  parameters  is  a  function  of  the 
true  disturbance  covariance  matrix.  In  addition,  the  estimation  error  varies  as  a  function  of  the  parameter  estimation 
algorithm  applied.  This  is  true  even  with  knowledge  of  the  best  model  order.  Nevertheless,  the  results  in  Section  VII  (as 
well  as  others  not  included)  indicate  that  some  PAMF  configurations  (distinct  filter  implementations)  exhibit  CFAR-like 
behavior  over  a  wide  range  of  parameters  and  conditions,  analogous  to  the  PMF . 

Implementation  of  the  PAMF  as  in  Fig.  2  requires  selection  of  a  model  type  to  represent  the  disturbance  (null 
hypothesis  condition),  a  parameter  identification  algorithm,  and  a  whitening  filter  configuration.  The  work  reported 
herein  focuses  on  multichannel  AR  model  types,  and  two  multichannel  identification  algorithms;  namely,  the  Strand- 
Nuttall  (SN)[12],  [13]  and  least-squares  (LS)  [14]  algorithms.  Since  each  algorithm  has  distinct  characteristics  and 
performance  in  the  context  of  the  PAMF,  the  two  configurations  are  denoted  as  PAMF-SN  and  PAMF-LS,  respectively. 
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Fig.  2.  Deleclion  Srchileclure  oF  P5MF 


Fig.  3.  Tapped  Delsy  Eine  Implementation  For  MS  (P)  Filter 

The  inverse  of  an  AR  model  is  an  MA  model,  and  the  MA  whitening  filter  is  implemented  herein  as  a  multichannel 
tapped  delay  line.  Linear  state  variable  models  and  associated  identification  algorithms  are  considered  in  [11],  [15]. 
Specification  of  the  AR(P)  system,  automatically  specifies  the  inverse  MA(P)  system.  For  a  given  system  order,  the 
number  of  AR(P)  (or  MA(P))  complex-valued  parameters  is  J^P,  without  considering  the  distinct  elements  of 

the  residual  covariance  matrix.  Fig.  3  shows  a  tapped  delay  line  for  an  MA(P)  temporal  whitening  filter,  followed  by 
the  spatial  whitening  block  filter  generated  using  the  residual  covariance  matrix.  Due  to  lack  of  space,  the  interested 
reader  is  referred  to  [11]  for  implementation  details  of  the  model  identification  algorithms. 

VI.  PERFORMANCE  ANALYSES 

A.  Definitions  and  Vriterla 

Two  different  power  measures  were  adopted.  The  first  is  the  per-pulse,  per-channel,  input  SINR,  defined  as  SINR/^v  = 
^  ,  where  a  is  the  target  amplitude  (as  defined  previously),  and  denotes  the  variance  (power)  of  each  element  of 

the  disturbance  vector  at  each  time  instant.  The  second  measure  is  the  output  SINR  for  the  MF,  SINR=|a|^e^R^^^e. 
SINR  is  useful  for  comparing  detection  performance  of  STAP  algorithms  and  test  statistics  (including  the  MF),  and 
analytic  expressions  for  Pd  are  available  as  functions  of  SINR  (in  the  case  of  the  MF  test  statistic).  However,  SINR 
requires  knowledge  of  the  true  disturbance  block  covariance  matrix.  SINR/jv  can  be  established  in  most  cases,  but 
analytic  expressions  are  unavailable.  Thus,  SINR  is  adopted  for  simulated  data  analyses,  and  SINR/at  for  MCARM 
data  analyses.  In  addition,  clutter-to-noise  ratio  (CNR)  and  jammer-to-noise  ratio  (JNR)  are  defined  as  per-pulse, 
per-channel  variance  ratios.  Detection  analyses  with  simulated  data  evaluate  probability  of  defection  (Pd)  for  a  fixed 
value  of  probability  of  false  alarm  (Pfa);  whereas  analyses  with  measured  data  evaluate  the  defection  test  statistics.  A 
high  value  of  the  test  statistic  implies  a  high  Pd  since  for  the  methods  considered  herein  the  test  statistic  is  the  output 
SINR,  and  for  the  conditions  considered  herein  Pd  is  a  monofonically-increasing  function  of  output  SINR.  In  all  analyses 
the  time-averaged  sample  residual  covariance  is  used. 

P.  Detection  Teiformnnce  Using  Simulated  Hadar  Data 

A  modified  version  of  the  airborne  surveillance  phased  array  radar  clutter  model  in  [16]  was  used  to  generate  simulated 
radar  data  (this  software  is  based  on  analjdic  models  similar  to  those  in  [3]).  Temporal  de-correlation  effects  caused 
by  infernal  clutter  motion  are  included  by  multiplying  the  mfh  lag  of  the  clutter  model  ACS  by  a  factor  of  the  form  , 
where  jii  is  the  one-lag  clutter  temporal  correlation  coefficient.  The  parameter  jii  is  a  function  of  the  pulse  repetition 
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Fig.  4.  Probability  of  Defection  vs  SINE  (K=256,  Crab=0  deg) 


frequency  (PRF),  the  radiation  wavelength,  and  the  standard  deviation  of  the  clutter  velocity  components.  Simulation- 
based  detection  performance  results  are  presented  for  two  analyses  with  typical  radar  system  parameters  and  scenario 
conditions.  The  first  analysis  compares  the  performance  of  three  detection  statistics  (PAMF-SN,  PAMF-ES,  and  CFAR 
AMF)  via  plots  of  Pp  vs.  SINK.  For  the  first  analysis,  7  =  1  array  channels  and  JV  =  32  pulses.  Thus,  this  is  referred 
to  as  the  7  <<  JV  Analysis.  The  second  analysis  compares  the  Pd  obtained  with  the  PAMF-SN  and  the  PAMF-ES  at 
one  SEVR  value,  SINR  =  9  dB.  In  this  second  analysis,  7  =  11  and  JV  =  16;  this  is  the  7  « JV  Analysis. 

7  <<  JV  Analysis:  All  cases  considered  in  this  analysis  have  in  common  the  parameters  listed  in  Fig.  1,  as  well 
as  Jig  =  0.0,  fid  =  0.3336,  =  1  (normalized  receiver  noise  standard  deviation),  and  CNR  =  10  dB.  Crab  angle 

(g),  secondary  data  size  (K),  and  the  presence  of  jamming  are  variable,  as  specified  next.  Pd  as  a  function  of  SINR 
is  presented  for  three  scenario  cases:  Case  1  is  for  g  =  0  deg  and  no  jamming.  Case  2  is  for  g  =  20  deg  and  no 
jamming.  Case  3  is  for  g  =  0  deg  and  two  barrage  jammers;  one  jammer  is  at  =  -  0.35  with  JNR  =  15  dB,  and 
the  other  jammer  is  at  fjg  =  0.2  with  JNR  =  50  dB.  For  each  case,  two  values  of  secondary  data  size  are  considered: 
FT  =  27JV  =  256  and  JV  =  27  =  8.  Due  to  limitation  of  space  only  sample  plots  are  provided  in  this  paper.  Detailed 
results  will  be  presented  at  the  conference  as  well  as  in  a  companion  refereed  journal  publication.  Fig.  1  presents  Pd  vs. 
SINR  for  the  scenario  and  system  conditions  listed  above.  In  this  figure  the  MF  curve  (solid  line)  is  the  upper  bound 
in  performance.  This  curve  was  calculated  using  the  einalytic  relation  in  [3].  For  the  other  statistics,  each  Pd  value 
estimate  is  determined  via  Monte  Carlo  (MC)  analysis.  First,  a  threshold  is  determined  that  provides  Tfa  =  0.01  using 
JVmc*  =  50  repetitions  of  TipFA  —  2,000  independent  data  realizations  each.  Second,  this  threshold  is  used  to  estimate 
Pd,  also  using  JV^c  =  50  repetitions  of  T^pfa  =  2,000  independent  data  realizations  each.  The  CFAR  AMF  curve 
(dotted  line)  is  a  spline  interpolation  to  0.2  dB  spacing  of  simulation-based  results  (obtained  with  FT  =  27JV  =  256) 
generated  at  an  interval  of  1.0  dB  along  the  SINR  axis.  Pd  results  for  the  parametric  test  statistics  were  calculated  at 
3  dB  SINR  intervals  to  reduce  simulation  time.  In  all  cases  and  for  both  sample  size  conditions,  P  =  3  provided  the 
best  performance  for  the  PAMF  test  statistics,  which  is  less  than  the  Nuttall  upper  bound,  'Psnu  =  ^-23,  and  the  ES 
constraint,  Ptsi/  =  25.40.  Additional  results  of  our  investigation  not  reported  here  provide  the  following  observations. 
First,  the  PAMF  using  small  sample  size  out-performs  the  CFAR  AMF  using  large  sample  size.  Second,  the  PAMF-ES 
and  the  PAMF-SN  perform  similarly.  Third,  for  large  sample  size  (K  =  256),  the  PAMF  performs  close  to  the  optimal 
MF.  And  fourth,  for  small  sample  size  (K  =  8),  the  PAMF  performance  degrades  several  percentage  points  relative 
to  the  large  sample  case,  but  remains  close  to  the  optimal  MF.  These  observations  are  valid  also  for  a  wide  variety  of 
scenario  and  system  conditions,  although  the  relative  performance  of  these  two  PAMF  versions  can  vary.  Systems  in 
which  the  number  of  channels  is  approximately  the  same  as  the  number  of  pulses  constitute  one  such  example. 

7  «  JV  Analysis:  Table  1  presents  Pd  results  at  SINR  =  9  dB  and  K  =  256  for  the  three  cases  considered.  As  in 
the  first  analysis,  fu  =  0.0;  however,  fu  =  0.1624,  which  is  much  closer  to  the  clutter  ridge  than  the  value  used  for 
the  detection  plots.  Besides  Pd,  the  table  includes  the  calculated  standard  deviation  of  the  estimated  Pd,  denoted  as 
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Case 

MF 

1  CFAR  AMF  1 

1  PAMF-SN  (P=I) 

Crab  Angle  (deg) 

Nd.  of  Jammers 

Pd 

Pd 

SD(Pd) 

Pd 

SD(Pd) 

Pd 

SD(Pd) 

0 

0 

0.8635 

0.4571 

0.0397 

0.4523 

0.1017 

0.8474 

0.0109 

20 

0 

0.8635 

0.4610 

0.0382 

0.2160 

0.1015 

0.8277 

0.0276 

0 

2 

0.8635 

0.4565 

0.0375 

0.3906 

0.1018 

0.8469 

0.0111 

TSBEEI 

Detection  perpdrmsnce  of  the  PSMF  for  the  three  simulation  cases  at  SINK  =  9dB  (J  =  14;  N  =  16;  K  =  266;  =  0.1624) 


SD[P£)].  The  MF  and  AMF  sfafisfics  are  presented  also.  Model  order  is  P  =  1  for  both  parametric  methods  (here 
T^SNU  =  0.86  and  Tlsu  =  14.87).  Inspection  of  Table  1  indicates  the  lack  of  CFAR  for  the  parametric  detection 
statistics,  although  the  variability  in  P/>  for  the  PAMF-ES  is  only  on  the  order  of  2is  as  good  as  or  better  than  in  Figs. 
4.  The  performance  of  the  PAMF-SN  is  comparable  to  or  less  than  that  of  the  AMF,  which  is  a  noticeable  degradation 
in  relation  to  the  results  in  Figs.  4.  This  is  foretold  by  the  NuttaJl  upper  bound. 

U.  'Detection  Tefjormmce  Using  Measured  'Radar  Data 

MCARM  database  [17]  analyses  were  carried  out  using  one  elevation  channel,  four  azimuth  channels,  and  thirty-two 
pulses  from  acquisition  575,  Bight  5  (file  rl050575).  Range  bins  (RBs)  142  through  469,  inclusive,  are  used  (range- 
dependent  power  loss  is  compensated).  Two  distinct  filter  adaptation  procedures  were  applied  to  study  the  effects  of 
secondary  data  size  and  content.  In  Procedure  A  a  fixed-window  filter  is  designed  using  K  =  2JN  =  256  RBs,  selected 
as  RBs  142  through  269,  inclusive,  and  RBs  341  through  468,  inclusive.  A  single  fixed-window  filter  is  designed,  and 
the  detection  test  statistic  is  generated  for  RBs  270  through  340,  inclusive.  In  Procedure  B  a  moving-window  filter  is 
designed  using  K  =  2J  =  8  RBs.  The  moving  window  consists  of  eleven  adjacent  RBs;  [four  secondary  bins  —  one 
guard  bin  —  test  bin  —  one  guard  bin  —  four  secondary  bins].  This  11-bin  window  is  applied  with  RB  270  and  RB 
340  as  the  first  and  last  test  bins,  respectively,  and  the  window  is  slid  from  bin-to-bin  between  these  two  edge  RBs. 
Target-present  and  target-absent  conditions  are  studied  by  inserting  artificial  targets  with  an  'STNRin  of  -30  dB  at  RBs 
291  and  293,  and  with  an  RTNRin  of  -10  dB  at  RBs  238,  269,  373,  and  400.  Disturbance  power  ( )  is  estimated  as  a 
five-bin  average,  centered  on  the  RB  in  which  the  target  is  placed.  All  targets  have  Jj,  =  0.0  and  'fid  =  0.1.  Model  order 
values  P  =  [2,3,4]  were  evaluated  for  each  algorithm  and  procedure,  and  the  model  with  best  performance  was  selected. 
The  selection  criteria  was  a  combination  of  highest  target  peak  value,  and  lowest  false-target  peak  values.  Model  orders 
3  and  4  performed  similarly  in  most  cases.  The  moving-window  CFAR  AMF  filter  required  diagonal  loading  of  the 
sample  block  covariance  (at  55  dB  below  the  maximum  diagonal  element).  A  f}q)ical  plot  is  shown  in  Fig.  5.  Detailed 
results  will  be  presented  at  the  workshop  as  well  as  in  a  companion  journal  paper.  Fig.  5  is  for  the  fixed-window  filter, 
shows  that  the  PAMF  fixed-window  filter  (with  K  =  256)  bring  out  the  targets  at  RBs  291  and  293  with  at  least  25 
dB  (almost  30  dB  for  the  PAMF-LS)  above  the  0  dB  mean  value.  Also,  the  highest  background  peak  is  at  least  15  dB 
below  the  lowest  target  peak  for  the  PAMF  test  statistics.  The  AMF  fails  to  produce  a  peak  at  the  target  RB  locations 
due  to  the  four  large-amplitude  targets  (-10  dB)  with  the  same  steering  vector  in  the  secondary  data.  Additional  results 
for  the  case  of  a  moving-window  filter  as  well  as  fixed  window-filter  will  be  presented  at  the  workshop. 

VII.  SUMMARY  AND  CONCLUSIONS 

The  parametric-based  approach  introduced  herein  combines  the  use  of  prediction  error  filtering  methods  with  model 
identification  algorithms  to  achieve  data  whitening,  which  is  then  followed  by  matched  filtering.  This  method  is  referred 
to  as  the  parametric  adaptive  matched  filter  (PAMF).  The  PAMF  admits  a  variety  of  distinct  implementations,  based 
on  the  model  identification  algorithm  used  and  the  whitening  filter  architecture  adopted.  Herein  the  Strand-Nuttall 
(SN)  and  the  multichannel  formulation  of  the  least-squares  (ES)  method,  both  for  AR  models,  were  considered. 

Detection  performance  results  were  reported  for  a  simulated  radar  data  analysis  involving  the  two  PAMF  algorithms 
and  the  CFAR  AMF.  Airborne  radar  measurements  collected  from  the  AFRE  MCARM  program  were  processed  to 
generate  detection  test  statistic  vs.  range  bin,  at  specific  input  SINR  levels.  This  establishes  an  algorithm’s  ability  to 
extract  the  signal  from  the  range  bin  under  test  and  to  reject  unwanted  disturbance  processes. 

The  parametric  methods  out-perform  the  AMF,  providing  a  significant  detection  performance  enhancement  for  both 
sinnilated  and  measured  data.  Specifically,  for  simulated  data,  the  PAMF  methods  with  sample  support  satisfying 
the  ’’Brennan  rule”  perform  close  to  the  known-covariance  MF.  For  simulated  data  and  small  sample  support,  the 
performance  of  both  PAMF  statistics  is  still  close  to  that  of  the  MF  curve.  However,  for  cases  wherein  the  number  of 
array  elements  is  approximately  the  same  as  the  number  of  pulses,  the  PAMF-ES  maintains  its  level  of  performance, 
but  the  PAMF-SN  degrades  significantly.  Of  relevance,  the  PAMF  is  tolerant  to  the  presence  of  targets  in  the  secondary 
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Fig.  5.  PSMF-ES  and  SMF  Test  Sfalislics  (fixed-window  filter) 


data,  for  both  small  and  large  secondary  data  sets.  Comparative  performance  evaluation  of  the  PAMF  and  other 
dimensionality-reducing  methods  is  an  on-going  acivity,  to  be  reported  in  the  future. 

A  current  limitation  of  the  PAMF  is  the  lack  of  CFAR  property.  This  is  due  to  the  dependence  of  the  estimation 
error  in  the  filter  parameter  estimators  on  the  ’’true”  disturbance  covariance.  Furthermore,  the  estimation  error  varies 
from  one  parameter  estimation  algorithm  to  another,  with  the  PAMF-ES  providing  the  most  CFAR-like  behavior. 
Considerable  progress  towards  CFAR-like  behavior  has  been  realized  by  utilizing  the  error  covariance  matrix  estimated 
using  the  prediction  error  filter  residuals,  rather  than  the  model-based  estimate  obtained  from  the  parameter  estimation 
algorithms.  CFAR  options  for  the  PAMF  is  an  area  of  current  research. 

Other  current  and  future  work  covers  the  combination  of  the  parametric-based  processing  method  with  defection  rules 
other  than  the  MF.  This  is  being  considered  for  the  case  of  compound-Gaussian  clutter  disturbance  statistics  also. 
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Abstract 

A  multiple  sensor  target  tracking  algorithm  is  presented.  The  algorithm  combines  polar 
coordinate  data  from  a  range-angle  sensor,  such  as  a  RADAR  and  an  angle-only  sensor,  such 
as  an  acoustic  array,  and  outputs  Cartesian  coordinate  track  data.  This  non-linear  estimation 
problem  is  solved  using  a  set-based  estimation  technique,  which  does  not  rely  on  statistical 
assumptions  about  the  sensor  measurement  noise.  Issues  including  data  association  and 
data-fusion  are  automatically  solved  using  this  approach. 

1  Introduction 

Tracking  targets  using  sampled  noisy  range  and  angle  radar  measurements  is  a  difficult 
problem.  It  is  further  complicated  when  the  aim  is  to  combine  this  with  sampled  noisy 
angle-only  measurements  from  a  passive  sensor  [1].  There  are  several  reasons  for  this. 
Firstly,  target  tracking  is  generally  performed  in  an  x-y  Cartesian  coordinate  system  whereas 
sensor  measurements  are  made  in  a  range-bearing  coordinate  system.  'This  results  in  a  non¬ 
linear  estimation  problem  which  is  difficult  to  solve  [2].  Secondly,  tracking  using  passive 
sensors  is  also  a  highly  non-linear  estimation  problem  and  also  results  in  an  unobservable 
system  which  makes  conventional  statistical  estimation  techniques  difficult  to  apply  [1,2]. 
Thirdly,  conventional  statistical  estimation  approaches  involve  making  assumptions  about 
the  sensor  measurement  noise,  most  commonly-  that  it  is  Gaussian  stationary  noise.  The 
problem  is  that  in  real  sensor  systems,  the  noise  is  biased,  non-Gaussian  and  non-stationary, 
due  to  effects  such  as  dynamic  video  quantization,  digital  sampling  quantization,  receiver 
saturation,  finite  numerical  precision,  etc.  Finally,  combining  data  from  active  and  passive 
sensors  is  generally  treated  as  a  separate  problem,  commonly  termed  jointly  as  'data- 
association'  and  'data-fiision'.  This  is  generally  solved  as  a  statistical  hypothesis  testing 
problem  which  relies  on  the  statistical  assumptions  mentioned  previously,  and  these 
assumptions  are  often  flawed. 

This  paper  presents  an  alternative  approach  which  utilizes  set-based  estimation  principles  [5]. 
Set-based  estimation  assumes  as  little  as  possible  about  the  system  other  than  to  place 
bounds  on  those  quantities  to  be  estimated  or  any  uncertainties  (measurement  errors).  This 
leads  to  a  recursive  procedure  whereby  the  estimate  at  time  k  is  the  set  which  is 

k 

consistent  with  an  initial  set  Sq  and  all  the  measurement  sets  up  to  time  k, 

1=0 

A  fundamental  problem  is  to  obtain  feasible  representations  for  the  sets  [5].  A 
representation  must  accurately  and  tightly  bound  the  true  parameters  and  any  uncertainties, 
and  be  computationally  tractable.  Common  representations  include  polytopes  and  ellipsoids. 
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In  this  paper,  rectangles  are  the  chosen  representation.  Although  rectangles  do  not  yield  the 
tightest  bound,  they  are  easily  represented  and  provide  a  tractable  solution  to  this  problem. 
Another  important  advantage  of  this  approach  is  that  it  yields  a  robust  algorithm  despite  the 
highly  non-linear  nature  of  the  problem.  Finally,  the  data-association  and  data-fiision 
problems  are  solved  automatically  using  this  approach.  Set-based  estimation,  also  commonly 
called  'bounded-error  estimation',  is  a  simple  technique  which  has  been  applied,  successfully, 
to  solving  target  tracking  problems  [3,4].  In  [4]  an  algorithm  similar  to  the  one  proposed 
here  is  developed  for  a  single  sensor  tracking  application.  This  paper  extends  the  work  to 
the  case  of  two  dissimilar  sensors. 

An  outline  of  this  paper  is  as  follows.  Section  2  formvJates  the  problem  to  be  solved.  The 
sensor  -  target  geometry,  and  the  target  and  measurement  models  are  presented.  Section  3 
presents  the  set-based  estimation  algorithm  that  performs  target  tracking  and  data- 
association  and  data-fusion.  Section  4  concludes  the  paper. 

2  Problem  formulation 

The  sensor  and  target  geometry  is  illustrated  in  Figure  1.  The  set-based  algorithm  presented 
here  recursively  computes  the  rectangular  set  over-bounding  the  x  and  y  coordinates  at  time 
index  k  that  are  consistent  with  all  the  observations  up  to  time  index  k. 


Figure  1  Sensor  and  target  geometry 


A  rectangular  set  is  aligned  with  the  x-y  coordinate  system  and  is  parameterized  by  upper 
and  lower  bounds  in  x  and  y;  the  current  state  estimate  is  an  element  of  the  current  set 
estimate  denoted  by 


(1) 


The  state  vector  for  this  estimation  problem  can  be  defined  by 
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s(^|/^  )=[^^  (4|/^  )  (k\k  )  (k\k  )  {k\k  )  (^|/^  )  )f . 

The  state  equation  assumes  a  constant  heading  constant  velocity  target  and  is  given  by 
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Such  bounds  are  easily  defined  in  practice  and  are  more  robust  than  assuming  a  statistical 
distribution.  Referring  to  equation  (1),  note  that  the  state  is  defined  in  Cartesian  coordinates, 
therefore  the  measurements  must  be  transformed  from  polar  to  Cartesian  coordinates. 
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Set-based  tracking  algorithm 
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Step  1  yields  the  following  boundaries: 

(x  <  25,  j  <  17), (x  >  17,  j  <  17),(x  <  18,  j  >  10), (x  >  10,  j  >  10) 
(x  <  17,7  >  9),(^  ^  17,  J  <  17),(x  >11,7  >  3),(x  >  11,7  ^  H) 


Step  2  yields  the  following  consistent  boundaries: 

(x  >  17,7  <  17), ^  17,7  ^  17),(^  ^  11,  J  ^  11) 


Step  3  yields  the  following  final  boundaries 
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Note  in  this  case  the  boundaries  have  not  changed.  This  is  due  to  the  fact  that  rectangles 
due  not  tightly  over-bound  the  intersection  of  two  sets.  Step  4  has  been  omitted  for 
simplicity.  Finally,  the  set-based  tracking  algorithm  is  easily  initialized  by  assigning  an 
arbitrarily  large  set  to 

4  Conclusions 

Set  based  estimation  techniques  provide  the  basis  for  robust  and  simple  tracking  where  one 
is  required  to  work  with  non-linear  coordinate  systems  or  where  assumptions  of  Gaussian 
and  stationary  noise  are  inappropriate.  The  set  based  tracker  also  has  the  benefit  of  simple 
implementation  in  fixed  point  arithmetic  microprocessors.  It  is  well  known  that  in  practical 
tracking  systems,  ad  hoc  techniques  are  usually  employed  to  improve  PCalman  filter  stability 
including  fixing  Kalman  gains  after  a  few  track  updates  or  reverting  to  alpha-beta  trackers 
with  fixed  gains.  Set  based  tracking  has  no  such  problems  as  there  is  no  matrix  inversion. 
The  techniques  also  lends  itself  well  to  manoeuvre  detection. 

The  technique  described  in  this  paper  is  being  investigated  as  the  basis  for  a  distributed 
sensor  tracking  system  including  integrated  data  association  and  sensor  management  where 
there  are  communication  bandwidth  constraints. 

Simulation  results  will  be  presented  at  the  workshop. 
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Abstract 

We  propose  an  approach  to  achieve  high-performance  localization  of  multiple  sources  using  a  small 
aperture  array  of  spatially-distributed  electric  and  magnetic  component  sensors.  The  approach 
is  based  on  exploiting  of  all  available  electromagnetic  information  along  with  the  time  delay  in¬ 
formation.  Using  simulated  data,  we  demonstrate  that  this  approach  outperforms  both  a  single 
vector-sensor  and  scalar-sensor  arrays  in  accuracy  of  direction-of-arrival  (DOA)  estimation. 

1  Introduction 

The  problem  of  estimating  electromagnetic  wave  parameters  using  sensor  arrays  has  attracted 
significant  attention  over  recent  years  and  lead  to  the  development  a  number  of  high  resolution 
algorithms,  such  as  MUSIC,  ESPRIT  and  WSF.  These  algorithms  have  focused  on  direction-of- 
arrival  estimation  in  such  areas  as  wireless  communications  and  radar. 

Most  existing  array  processing  methods  rely  on  the  spatial  diversity  of  the  sensor  array  to 
estimate  the  DOA.  A  drawback  of  this  approach  is  that  the  performance  accuracy  becomes  highly 
dependent  on  the  size  of  the  array’s  electrical  aperture.  In  many  applications,  the  array  is  expected 
to  operate  over  a  wide  frequency  range.  To  avoid  ambiguities  in  the  array  manifold,  the  physical 
size  of  such  broadband  array  is  constrained  by  the  highest  operating  frequency  and  the  number 
of  sensors.  Poorer  performance  at  lower  frequencies  will  result  due  to  their  larger  wavelengths, 
especially  when  small  number  of  receiver  channels  is  available.  The  costly  approach  to  alleviate 
this  problem  is  to  aim  for  larger  “unambiguious”  array  geometry  by  increasing  the  number  of 
receiver  channels.  Another  way  to  overcome  this  problem  is  to  use  multiple  sets  of  sensor  arrays 
where  each  set  is  optimized  to  operate  over  a  smaller  bandwidth.  This  may  not  be  feasible  in 
mobile-  or  fast-deployment  sensor  array  applications.  Hence,  there  is  a  need  to  develop  DOA 
estimation  methods  that  use  a  small-aperture  array  that  achieve  good  performance  over  a  wide 
operating  frequency. 

The  DOA  estimator’s  performance  can  be  improved  by  using  polarization-sensitive  sensor  array 
to  exploit  the  polarization  diversity  of  the  signals  by  estimating  their  signal  polarization  parameters 
along  with  their  DOA  [2]  [3]  [4].  In  a  recent  development,  Nehorai  and  Paldi  [1]  introduced 
the  concept  of  vector-sensor  array  processing  where  the  complete  electromagnetic  information 
of  the  signal  is  measured  and  processed.  They  apply  the  Poynting  relationship  between  the 
electric  and  magnetic  measurements  to  enable  estimation  of  the  DOA  of  multiple  signal  sources 
using  a  single  vector-sensor.  Direction-finding  with  a  vector  sensor  (SuperCART  antenna  array) 
was  demonstrated  in  [7].  Since  it  does  not  rely  on  spatial  diversity,  a  DOA  estimator  using  a 
single  vector  sensor  should  exhibit  consistent  performance  over  its  operating  frequency  band  and 
should  easily  work  with  wide-band  signals[l].  When  operating  as  an  array  of  vector  sensors,  the 
electromagnetic  and  time-delay  measurements  can  be  simultaneously  used  to  estimate  the  DO  As. 
This  allows  the  use  of  smaller  aperture-arrays  while  maintaining  good  performance  over  a  wide 
frequency  bandwidth.  However,  employing  an  array  of  vector  sensors  may  be  expensive  because 
large  number  of  receivers  is  necessary.  For  example,  a  3-vector  sensors  array  will  require  an 
18-channel  receiver. 
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This  paper  proposes  a  simple  and  effective  alternative  for  achieving  good  DOA  estimation 
p>erformance  with  small  aperture-arrays.  The  approach  uses  an  array  of  spatially  distributed  scalar 
magnetic  and  electric  sensors.  We  shall  call  the  proposed  array  as  distributed  electromagnetic 
component  array  (DEMCA).  It  is  assumed  that  the  eirray  of  scalar  magnetic  and  electric  sensors 
should,  in  aggregate,  measure  at  least  the  all  the  3D  electric  and  magnetic  components  of  the 
electromagnetic  wave.  The  proposed  DEMCA  will  afford  the  following  three  advantages:  Firstly, 
the  full  electric  and  magnetic  field  components  measure  by  the  magnetic  and  electric  sensors; 
thereby  effectuating  derivation  of  the  sources’  directional  information.  Secondly,  their  spatial 
distribution  will  allow  extraction  of  additional  sources’  directional  information  by  way  of  the 
differential-delay  measurements.  Finally,  DEMCA ’s  structure  will  significantly  economize  the 
number  of  receivers  needed  to  simultaneously  utilize  the  time-delay  and  comf)Zete  electromagnetic 
information  for  DOA  estimation. 


2  Measurement  Model 


Adopting  the  conventions  in  [1],  the  measurement  model  of  the  vector  sensor  is  given  by 


where 
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u  is  the  unit  direction  vector  from  sensor  to  source  and  Ux,  Uy  and  are  the  x,  y  and  z  components. 
The  matrices  V,  Q  and  vector  w  are  given  by 


V  = 
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Q  = 
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and 

w  = 

where  0i,  02,  03  and  04  are  the  azimuth,  elevation,  ellipse’s  orientation  and  ecentricity  angle. 

Extending  from  (1)  and  assuming  that  the  signal  sources  are  narrowband,  we  can  write  the 
measurement  model  of  the  distributed  component  sensor  array  in  a  multiple  source  environment 
as  [6] 
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where  =  [0^^^ ,0^^^]  denotes  the  directional  and  polarization  parameters  of  the 
source  signal.  r(0i,02)  is  a  diagonal  matrix  whose  TC*^diagonal  entry  is  given  by  [r(0i,02)]nTC  = 
a„(0i,  02)e^“°’^”,  where  is  the  differential  delay  of  the  signal  source  between  the  n^^  component 
and  the  phase  center  and  an{0i,02)  is  the  response  of  the  component  sensor;  Wc  is  the  carrier 
frequency  and  f2  is  a  selection  matrix  elements  of  1  and  0.  For  example,  when  orthogonal  triads 
of  magnetic  and  electric  sensors  are  used,  f2  =  le.  If  an  additional  x  electric-component  sensor  is 
used,  the  selection  matrix  becomes 


n  = 


1  0  0  0  0  0 

l6 


(7) 
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Prom  (6),  observe  that  the  electromagnetic  sources  directional  information  are  all  embedded  in 


I3 

(UfcX) 


V;t. 


This  allows  the  differential  delay  measurements  resulting  from  diverse  placement  of  the  component 
sensors  and  electromagnetic  field  measurements  to  be  jointly  exploited  in  estimating  the  source 
parameters.  Given  both  the  complete  electromagnetic  and  spatial  information,  good  parameter 
estimation  with  a  smaller  aperture  array  can  be  expected  over  a  wide  frequency  range.  It  suffices  to 
point  out  that  the  distributed-component  sensors  array  model  in  (6)  generalizes  the  vector-sensor 
array  [6]. 

We  can  express  (6)  compactly  in  matrix  form  as 


y{t)  =  As{t)  +  n{t)  (8) 

where 

A=  [a(0(i)).--a(0(''))]  (9) 

and  s(t)  =  [5i(t) . . .  Sd(i)]^- 


3  Cramer  Rao  Bound 


We  use  the  Cramer-Rao  bound  (CRB)  to  examine  the  performance  gain  achievable  by  our  ap>- 
proach.  Using  the  notations,  statistical  assumptions  and  results  in  [1]  [6],  the  CRB  is  given  by 

Carb{&)  =  ^Re{J-'},  (10) 

J  =  btr((l  la  U)  □  (D^n^D)*’^) 

where 


P  =  E(s(f)s^(t)), 
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and  where  <7^  is  the  noise  power  and  N  is  the  number  of  independent  snapshots.  In  order  to 
circumvent  the  intrinsic  singularities  due  to  the  reference  coordinate  system,  the  mean  square 
angular  error  (MSAE)  was  proposed  in  [1]  and  is  given  by 


MSAE„  ^  JV[cos2  02C'c.6(0i)  +  CorbiOi)].  (11) 


4  Numerical  Example 

By  using  a  numerical  example,  we  shall  demonstrate  the  greater  efficacy  of  the  distributed  elec¬ 
tromagnetic  component  sensor  array  (DEMCA)  processing  when  it  is  compared  with  scalar-array 
processing  that  relies  on  an  electric- only,  diversely  polarized  and  co-polarized  antenna  array.  Since 
the  motivation  of  this  development  is  the  design  of  a  small-aperture  sensor  array,  we  shall  make 
the  comparison  based  on  the  principle  of  “equal  aperture,  equal  number  of  channels” .  We  assume 
a  six-channel  receiver  and  use  a  six  element  uniform  circular  array  in  this  analysis.  This  will  allow 
the  comparison  between  the  performance  of  a  vector-sensor  as  well  as  a  six  element  diversely  and 
co-polarized  array  with  the  proposed  DEMCA.  The  diversely-polarized  array  used  in  this  study  is 
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Figure  1:  Array  Geometry  of  Distributed  EM  Component  Sensor  Array,  x-electric  (co-polarized)  array 
and  electric-only  diversely  polarized  array.  Ei(Hx),  Ej,  (Hj,)  and  £2(112)  are  the  electric  (magnetic) 
component  sensors. 


an  array  of  x,  y  and  z-electric  component  sensors.  The  difference  between  the  diversely-polarized 
and  the  proposed  sensor  array  is  that  the  former  uses  only  electric  component  sensors  while  the 
latter  uses  both  electric  and  magnetic  component  sensors  to  form  a  sbc-element  sensor  array  with  a 
six-channel  receiver.  The  three  sensor  arrays  are  depicted  in  Figure  1.  Note  that  the  inter-element 
spacing  is  fixed  at  Amztx  —  —  >  where  c  is  the  speed  of  light  and  /max  is  the  maximum  operating 
frequency. 

An  example  of  the  DOA  estimation  performance  as  a  function  of  frequency  is  shown  in  Figme  2. 
We  considered  two  uncorrelated  sources  with  [1°,  10°,  45°,  0°]^  and  [5°,  9°,  —45°,  —5°]^. 

The  signal-to-noise  ratio  is  fixed  at  lOdB.  Therein  the  inter-element  spacing  of  the  uniform  circular 
array  is  fixed  at  Observe  from  the  figure  that  the  distributed  EM  sensor  array  has  consistent 

performance  over  a  wide  operating  bandwidth.  In  addition,  it  achieved  four  orders  of  magnitude 
of  gain  in  accuracy  of  DOA  estimation  over  the  x  electric  array  and  one  order  of  magnitude  over 
the  electric-only,  diversely-polarized  array  at  =  0.3.  This  result  clearly  demonstrates  the 
gqin  obtainable  from  the  full  exploitation  of  the  spatial  and  electromagnetic  information  afforded 
by  DEMCA. 

Figure  3  plots  the  DOA  estimation  performance  as  a  function  of  the  azimuthal  angle  of  separa¬ 
tion  between  uncorrelated  two  sources  having  lOdB  SNR.  The  normalized  operating  frequency  is 
fixed  at  =  0.3.  The  graph  shows  that  the  proposed  DEMCA  demonstrates  significant  perfor¬ 
mance  gauTespecially  for  closely-spaced  sources.  This  feature  is  particularly  useful  in  applications 
with  short  integration  time  or  at  low  signal-to-noise  ratio. 


5  Concluding  Remarks 

We  have  presented  a  new  approach  for  the  localization  of  electromagnetic  sources  through  the  joint 
exploitation  of  spatial  diversity  and  electromagnetic  information  using  spatially-distributed  electric 
and  magnetic  componet  sensors.  Performance  analysis  via  numerical  examples  illustrated  the 
potential  gain  of  the  proposed  approach  over  the  scalar  and  diversely  polarized  array.  The  analysis 
indicated  that  the  distributed  component  EM  sensor  array  should  allow  the  use  of  small  array 
apertures  while  maintaining  desired  resolution  and  performance  accuracy  over  a  wide  operating 
bandwidth. 
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Support  vector  machines  are  investigated  as  a  method  for  performing 
nonlinecir  equcJization  in  communication  systems.  The  support  vector  ma¬ 
chine  has  the  advantage  that  a  smeJIer  number  of  parameters  for  the  model 
Ccin  be  identified  in  a  maimer  that  does  not  require  the  extent  of  prior  in¬ 
formation  or  heuristic  assumptions  that  some  previous  techniques  require. 
An  enhanced  method  of  using  a  beink  of  support  vector  machines  allows 
utilization  of  previously  detected  symbols  to  increaise  performance.  Sim¬ 
ulation  results  are  compared  agciinst  results  from  other  researchers  using 
techniques  such  as  neurcJ  networks  amd  decision  feedback.  We  find  that 
the  support  vector  machine  genercJizes  well  given  a  set  of  training  data, 
with  the  tradeoff  typically  being  decrecised  performance  from  the  optimum 
Bayesicm  solution. 


Key  Words:  nonlinecir  equalization;  support  vector  machine;  SVM;  ISI;  decision  feedback 


0.  INTRODUCTION 

A  support  vector  machine  (SVM)  [2]  embodies  the  concepts  of  generalized  learn¬ 
ing  theory  developed  by  Vapnik  [8]  and  others.  A  SVM  uses  training  data  as  an 
integral  element  of  the  function  estimation  model  as  opposed  to  simply  using  train¬ 
ing  data  to  estimate  parameters  of  an  apriori  model  using  maximum  likelihood 
techniques.  In  this  sense,  it  is  more  general  in  terms  of  noise  and  correlation  prop¬ 
erties  than  methods  such  as  radial  basis  function  (RBF)  networks  [4].  Furthermore, 
the  optimization,  or  learning,  method  of  SVMs  is  more  manageable  and  generalizes 
better  than  techniques  such  as  Volterra  filters  [1,  5]  and  neural  networks  [3]. 

These  are  strong  motivations  for  investigating  the  use  of  SVMs  for  nonlinear 
equalization,  or  more  appropriately  detection.  In  the  case  of  equalization,  it  is 
desirable  that  a  modem  require  a  small  set  of  training  data  to  characterize  the 
transmission  channel.  Also,  the  model  should  be  efficient  for  real-time  applications. 
SVMs  train  with  relatively  small  amounts  of  data,  and  once  training  of  the  SVM 
has  completed,  the  detection  stage  is  efficient,  comparable  to  Volterra  filters  and 
neural  networks. 

1 


167 


2 


SEBALD  AND  BUCKLEW 


e(n) 


u{n  —  D) 

FIG.  1.  Model  of  the  nonlinear  transmission  system,  originating  from  Chen,  et  al.  [3]. 


Standard  SVM  results  are  compared  against  that  for  a  neural  network  presented 
in  [3].  Then  results  for  a  SVM-bank  (SVMB)  structure  are  compared  against  that 
for  a  decision-feedback  equalizer  presented  in  [6].  The  latter  system  has  the  ad¬ 
vantage  that  it  uses  detected  symbols  to  modify  decision  boundaries.  The  result  is 
increased  performance  over  the  standard  SVM  in  typical  communication  channels 
having  appropriate  signal-to-noise  ratios  (SNR). 

1.  DETECTION  VIEWED  AS  PATTERN  RECOGNITION 

As  pointed  out  in  [3],  equalization  may  be  viewed  as  a  classification  problem. 
In  such  a  scenario,  the  output  of  a  communications  channel  can  be  grouped  as 
a  vector  and  used  as  the  input  to  a  classification  machine  whose  output  should 
match  as  best  as  possible  some  delayed  version  of  the  original  signal  entering  the 
channel.  Figure  1  shows  a  discrete-time  pulse  amplitude  modulation  model  of  a 
communications  channel.  The  transmitted  data  sequence,  w(n),  is  an  independent, 
equiprobable  binary  sequence  taking  values  {— 1,-fl}.  The  output  of  the  channel, 
x{n)  €  M,  is  the  sum  of  a  deterministic,  nonlinear  function  of  u{n),  x{n),  and  an 
additive  noise,  e{n).  The  goal  of  the  equalizer  is  then  to  mimic  the  desired  output 
u{n  —  D).  Call  the  equalizer  detection  output  u{n  —  D). 

The  deterministic  portion  of  the  channel  model  consists  of  a  linear,  finite  impulse 
response  (FIR)  filter  followed  by  a  polynomial  nonlinearity.  Let 

N-l  P 

x(n)  =  hku{n  —  k),  and  f(n)  =  CpX^{n), 

k=0  P=1 

where  {hk}  are  the  FIR  filter  coefficients  and  {cp}  are  the  polynomial  coefficients. 
By  grouping  the  output  of  the  channel  into  vectors 

T 

x(n)  =  [a;(n)  x{n  —  1)  ...  x{n  —  M  +1)] 

and  taking  the  desired  classification  to  be  a  training  sequence  input  to  the  channel 
delayed  by  D  samples,  i.e.,  j/„  =  u{n  —  D),  a  SVM  can  be  trained  to  solve  the 
detection  problem.  Varying  the  delay  D  results  in  different  performance  of  the 
equalizer  because  the  correlation  between  u{n  —  D)  and  x(n)  changes  with  D. 


NONLINEAR  EQUALIZATION  USING  SVMS 


3 


2.  SVM 

A  SVM  is  a  method  for  separating  clouds  of  data  in  the  feature  space,  i.e.,  the 
space  generated  from  nonlinear  mappings  of  the  pattern  space  data  x(n) ,  using  an 
optimal  hyperplane.  Given  an  input  vector  x,  an  SVM  classifies  according  to 

y  =  sign{/(x)} 

where  y  is  the  estimate  to  the  classification  and 

/W  =  ^  =  YlociyiK{xi,x)  +  b.  (1) 

ies  ies 

Here  {a,}  are  Lagrange  multipliers,  S  is  the  set  of  indices  for  which  x,-  is  a  support 
vector,  i.e.,  a  vector  for  which  a,-  ^  0  after  optimization,  and  K{-,-)  :  x 

M  is  a  kernel  satisfying  the  conditions  of  Mercer’s  theorem  [7,  8].  We  see 
in  (1)  that  after  training,  only  a  subset  of  the  training  data  enters  the  model  (i.e., 
data  reduction)  and  operations  are  only  performed  on  data  in  the  pattern  space, 
not  in  the  feature  space  (i.e.,  more  manageable  than  previously  studied  nonlinear 
techniques). 

According  to  Vapnik  [8],  for  training  data  which  is  non-separable  the  dual  opti¬ 
mization  problem  is  to  maximize 

L  ^  L 

i=l  i,j  =  l 


under  constraints 


L 

Y2ociyi  =  0,  and  0  <  a;  <  C,  i  =  l,...,Z/. 

t=i 

where  increasing  C  penalizes  errors  more  heavily  [2].  This  is  a  quadratic  program¬ 
ming  problem  that  may  be  solved  with  traditional  optimization  techniques. 

Preliminary  studies  on  character  recognition  problems  [8]  suggest  that  the  type 
of  SVM  kernel  is  inconsequential  so  long  as  the  capacity  is  appropriate  for  the 
the  amount  of  training  data  and  complexity  of  the  classification  boundary.  In  the 
simulations  here  the  polynomial  kernel  K(x,z)  =  (x  •  z  -f  1)“^  is  used  exclusively. 
The  polynomial  order  d  is  a  parameter  which  controls  the  capacity  of  the  SVM. 
The  greater  d,  the  more  complex  classification  boundary  the  SVM  can  create. 

3.  SIMULATIONS  AND  RESULTS 

The  constellation  points  are  the  noise-free  channel  outputs  resulting  from  the  var¬ 
ious  inputs  and  are  classified  according  to  the  value  of  D.  Let  the  noise-free  chan¬ 
nel  outputs  be  grouped  as  a  vector  x(n)  =  [i(n)  x(n  —  1)  ...  x{n  —  M  -f-  1)]^. 
Then  the  constellation  sets  and  classification  regions  can  be  expressed  as  Cs^^d  = 
{x(n)|M(n  -  D)  =  s,}  and  R,i,D  =  {x(n)|u(n  —  D)  =  Sj},  i  =  1,2.  The  opti¬ 
mum  classifier  assumed  here  is  the  Bayesian  maximum  likelihood  detector  under 
conditions  of  equiprobable  probabilities  and  zero/one  cost  [7]. 
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TABLE  1 


Simulation  statistics  for  a  standard  SVM  detector. 


D 

•Pe, train 

■Pe,te3t 

•^.unproc 

e, train 

^^CjUnproc 

JV.V 

•^msv 

0 

0.163 

0.172 

0.602 

0.0166 

0.00722 

0.00825 

205 

10.0 

1 

0.0458 

0.0589 

0.905 

0.00727 

0.00473 

0.00408 

66.5 

9.40 

2 

0.0366 

0.0421 

0.503 

0.00800 

0.00575 

0.00506 

51.1 

9.30 

The  first  simulation  is  with  the  nonlinear  channel  x{n)  =  i(n)  —  0.9  x^(n),  x{n)  = 
u{n)  +  0.5  u(n  —  1),  additive  white  Gaussian  noise  of  power  =  0.2,  and  SVM 
parameters  M  =  2,  C  =  h  and  d  =  3.  Results  are  an  average  of  ten  trials  with 
500  samples  in  the  training  set  and  5000  samples  in  the  test  set.  Statistics  for  the 
simulations  are  given  in  Table  1 .  Most  encouraging  is  that  the  standard  deviations 
for  probability  of  error  for  the  SVM  are  approximately  one  eighth  of  the  probability 
of  error,  and  the  error  probabilities  for  training  and  test  data  are  approximately 
the  same.  This  confirms  that  the  SVM  is  very  good  at  generalizing. 

We  now  test  the  behavior  of  the  SVM  on  a  channel  having  zero-mean,  colored 
noise  with  E[e{n)e{n  —  1)]  =  p.  Figure  2  shows  an  example  SVM  classifier  for 
channel  i(n)  =  x(n)  +0.1  £^(n)  +0.05x^(n),  x{n)  =  0.5u(n)  +  u(n  -  1),  (rj  =  0.2, 
p  =  0.48,  M  =  2,  D  =  0,  d  =  3,  and  C  =  5.  The  number  of  trials,  training  samples 
and  test  samples  were  the  same  as  in  the  previous  simulation.  Region  R+i,d  is 
shaded  while  is  left  unshaded.  Included  on  the  pattern  space  is  the  signal 

constellation  where  is  marked  by  a  large  •  and  C-i^d  is  marked  by  a  large 

X .  Training  data  is  also  displayed,  using  a  small  o  to  indicate  u{n  —  D)  =  +1  and  a 
small  X  to  indicate  u{n-  D)  =  —1.  The  optimum  Bayesian  solution  is  given  in  [3]. 
The  SVM  chooses  a  decision  boundary  similar  to  the  optimum  and  is  logical  in  terms 
of  the  training  data.  The  optimum  R_i,o  for  this  example  includes  a  disconnected 
region,  but  the  SVM  cannot  match  the  polygon  nature  of  the  optimum. 

A  third  example  compares  the  SVM  bit  error  rate  (BER)  against  the  opti¬ 
mum  BER  as  a  function  of  SNR  for  the  channel  £(n)  =  x{n)  +  0.2i^(n),  x{n)  — 
0.3482  «(n)  + 0.8704  u(n  -  1)  +  0.3482  w(n- 2),  p  =  0.48,  M  =  3,  =  1,  d  =  3, 

and  G  =  0.1.  The  number  of  training  samples  was  again  500,  while  the  number  of 
trials  and  test  samples  were  varied  to  compensate  for  greater  relative  variance  for 
low  BER  estimates.  The  BER  results,  given  in  Fig.  3,  show  that  the  SVM  requires 
approximately  2.0-2.5  dB  more  SNR  to  match  the  Bayesian  performance.  Above 
15  dB  SNR,  the  SVM  result  is  essentially  the  same  as  the  neural  network  solution 
given  in  [3]. 

The  last  simulation  considers  channels  having  severe  ISI  for  which  the  standard 
SVM  detector  does  not  perform  well.  It  has  been  shown  for  linear  channels  [6] 
that,  depending  upon  the  nature  of  the  ISI,  significantly  better  performance  can 
often  be  achieved  by  incorporating  previously  detected  symbols  into  the  detector. 
This  concept  is  exemplified  in  the  decision  feedback  equalizer  (DFE).  The  decision 
feedback  idea  can  be  incorporated  into  an  SVM  by  simply  lengthening  its  input 
vector  by  appending  previous  SVM  outputs.  That  is,  let 

x(n)  =  [  x(n)  x(n  —  1)  ...  x(n  —  M  +  \)  u[n  —  D—l)  ...  u[n  —  D  —  M')\ 
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FIG.  2.  Example  of  typical  classification 
region  of  a  standard  SVM  detector. 


SVM  detector  versus  the  Bayesian  detector. 
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FIG.  4.  A  bank  of  SVMs  with  selection  based  upon  the  state  of  previous  decisions.  Each  SVM 
constructs  a  different  decision  boundary  based  upon  the  constellation  for  a  given  state. 


where  M'  is  the  number  of  previously  detected  symbols  fed  back.  We  call  this 
approach  the  decision  feedback  SVM  (DFS VM) .  A  DFSVM  with  correct  decisions 
fed  back  is  called  the  perfect  DFSVM  (PDFSVM). 

Chen,  et  al.  [4]  described  an  important  property  of  the  pattern  space  related  to 
previously  detected  symbols  and  proposed  a  novel  method  of  utilizing  this  property 
in  their  system.  The  SVM  can  be  altered  to  utilize  the  same  ideas  by  building  a 
bank  of  SVMs  controlled  by  a  state  machine  having  previously  detected  symbols 
as  an  input.  This  approach  is  illustrated  in  Fig.  4.  Individual  SVMs  are  optimized 
using  subsets  of  training  data  conditioned  upon  previous  symbols. 

The  channel  model  for  this  last  example  is  that  studied  by  Proakis  in  [6]  for 
DFE.  That  channel  is  linear,  i.e.,  x{n)  =  ic(n),  with  output  x(n)  =  0.407 w(n)  + 
0.815  w(n  —  1)  +  0.407  u(n  —  2).  The  parameters  for  the  various  types  of  SVM  in 
the  simulation  were  M  =  2,  Z)  =  1,  d  =  3,  and  C  =  0.5.  For  the  DFSVM  and 
PDFSVM  M'  =  1.  The  added  noise  was  white.  The  number  of  training  samples 
was  500,  and  results  were  averaged  over  ten  trials.  The  number  of  test  points  was 
varied  to  account  for  variation  in  BER.  The  optimum  non-ISI  binary  signaling  is 
Pe.opt  =  where  erfc(-)  is  the  complimentary  error  function  and  7  is  the 

SNR  for  a  real  channel.  Figure  5  shows  the  performance  as  a  function  of  SNR 
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FIG.  5.  Average  BER  for  the  SVM,  SVMB,  DFSVM,  and  PDFSVM  detectors  compared 
against  the  optimum  binary,  non-ISI  detector. 


for  the  various  SVM  approaches.  The  SVMB  outperforms  the  DFSVM,  requiring 
approximately  2.0  dB  less  SNR  to  achieve  the  same  BER,  and  it  performs  about 
the  same  as  the  DFE  of  [6] . 


4.  SUMMARY 

Simulations  have  shown  that  the  SVM  provides  a  robust  method  for  address¬ 
ing  nonlinearities  in  communication  channels  exhibiting  ISI.  The  method  performs 
as  well  as  neural  networks  and  Volterra  filters,  and  has  several  advantages  over 
these  methods.  The  use  of  a  bank  of  SVMs  controlled  by  a  state  machine  allows 
incorporation  of  decision  direction.  This  significantly  increases  performance  for 
certain  channel  scenarios.  Currently,  the  SVM  can  only  be  used  in  block  adaptive 
applications  since  no  sample- by-sample  adaptive  version  is  known  to  exist. 
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Abstract 

This  paper  represents  the  first  phase  of  an  ongoing  investigation  into  the  nature  of 
sinusoidal  types  of  random  processes  associated  with  real  world  phenomena.  While 
perfect  sinusoids  exist  only  in  the  mathematical  sense,  their  use  as  an  approximation 
model  in  relation  to  real  world  phenomena  has,  and  continues  to  be  widespread;  often 
with  much  value.  The  motivation  for  this  work  is  the  belief  that  knowledge  of  their 
deviation  from  such  a  model  can  provide  additional  useful  information.  Ihe  focus  of  this 
paper  is  on  sinusoids  in  relation  to  random  processes  associated  with  rotating  machinery. 
The  tools  used  include  mathematical  limit  theorem  results,  standard  signal  processing 
tools  including  spectral  estimation  and  Kalman  filtering,  and  basic  statistics.  Some 
noteworthy  results  include  the  normality  of  amplitude  and  frequency,  characterization  of 
the  same  as  stationary  random  processes,  and  potential  to  improve  condition  monitoring 
of  rotating  machinery. 

1.  Introduction 

The  concept  of  a  sinusoid  arises  in  the  study  of  phenomena  in  practically  every  area  of 
science  and  engineering.  Examples  include  vibration  of  rotating  machinery  [SHW], 
species  extinction  rates  [RAS],  earthquake  prediction  [SAP],  atmospheric  wind  profiles 
[WIS],  sonar  [JOH],  heart  rate  variability  [MMM],  and  music  quality,  to  name  just  a  very 
few.  However,  a  true  sinusoid  only  exists  in  the  mathematical  sense.  Even  a  sine  wave 
generator  does  not  generate  a  perfect  sinusoid.  A  perfect  sinusoid  is  characterized  by 
three  constants,  namely,  amplitude,  frequency  and  phase.  Because  the  phase  parameter 
reflects  only  the  relation  of  the  sinusoid  to  the  time  at  which  the  sinusoid  is  first 
observed,  the  value  of  this  parameter  is  dictated  by  the  observer.  Both  amplitude  and 
frequency,  however,  are  subject  to  change  over  time.  And  so  it  is  these  parameters  which 
will  be  the  focus  of  this  paper.  It  is  worth  pointing  out  that  the  frequency  of  a  sinusoid  is 
simply  the  inverse  of  its  period,  since  in  many  applications  it  is  period,  and  not  frequency 
variability  that  is  of  interest. 

2.  Amplitude  and  frequency  (or  period)  variability  can  occur  slowly  over  time  (relative 
to  the  period  of  the  sinusoid),  as  well  as  within  a  single  period.  Here,  the  period  refers 
to  the  time  period  associated  with  360  degrees  of  angle.  In  many  rotating  machinery 
applications  the  speed  of  the  machine  will  vary  to  some  degree  slowly  over  time.  If 
the  basic  natures  of  the  associated  signals,  such  as  vibration,  sound  or  pressure,  are 
not  influenced  by  the  slow-time  variations,  then  it  may  be  possible  to  recover  truly 
periodic  signals  by  performing  a  time-to-angle  transformation  [LSI].  Even  then, 
however,  there  may  well  remain  intracycle  variability  of  sinusoidal  data. 
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2.  Tools  Used  for  Analysis 

The  tools  used  for  analysis  of  the  data  include  (I)  the  minimum  variance  (MV)  and 
associated  autoregressive  (AR)  spectral  families,  along  with  their  theoretical  convergence 
properties  [FFS],  (ii)  standard  power  spectral  density  (PSD)  estimates,  (iii)  extended 
Kalman  filtering ,  (iv)  fixed  order  AR  models  and  (v)  histogram,  scatter  plot,  and 
correlation  coefficient  information.  The  families  of  MV  and  AR  spectra  are  used  to 
identify  nominally  sinusoidal  components,  and  to  provide  input  to  the  extended  Kalman 
filter  for  tracking  the  time- varying  amplitude  and  frequency  of  a  real  sinusoid.  The  PSD 
is  used  primarily  for  comparative  purposes  because  it  is  the  tool  of  choice  for  analysis  of 
frequency  information  contained  in  (assumedly  stationary)  random  processes.  Fixed  order 
AR  models  are  used  in  an  attempt  to  better  characterize  the  time- varying  behavior  of 
amplitude  and  frequency  estimates.  Finally,  basic  statistical  tools,  including  histogram 
and  moment  estimates,  are  used  to  get  a  better  idea  of  the  distributional  properties  of 
frequency  and  amplitude  information. 

2.  The  Machinery  Data  to  he  Analyzed 

The  data  chosen  for  analysis  is  from  the  Westland  Helicopter  data  base  [WES].  This  data 
base  includes  vibration  data  taken  from  a  military  helicopter  under  well  controlled  test 
conditions,  and  for  a  variety  of  planted  fault  conditions.  Our  analysis  will  focus  on  the 
vibration  associated  with  accelerometer  number  6  for  the  no-fault  condition,  and  for  a 
pinion  bearing  fault  near  to  the  measurement  location.  The  data  were  originally  sampled 
at  1031 16.08  Hz.  Because  of  the  preponderance  of  energy  below  20,000  Hz,  this  data  was 
decimated  by  a  factor  of  5.The  no-fault  data  was  chosen  because  of  the  presence  of  a  very 
strong  sinusoidal  component  at  3150  Hz.  This  affords  us  the  opportunity  to  study  a  real 
world  sinusoid  without  complications  associated  with  a  significant  amount  of  noise 
corruption.  The  pinion  bearing  data  was  chosen  for  two  reasons.  Since  the  strength  of  this 
strong  sinusoid  was  notably  attenuated,  this  afforded  the  opportunity  to  study  a 
potentially  more  complex  real  world  sinusoid.  It  also  provided  the  opportunity  to  explore 
the  potential  for  using  only  sparse  information  associated  with  a  single  sinusoid,  as 
opposed  to  the  totality  of  information  contained  in  a  PSD,  to  characterize  the  influence  of 
a  mechanical  fault.  It  should  be  noted  that  the  above  frequency  of  interest  corresponds  to 
a  number  of  component  gear  mesh  frequencies,  but  not  to  the  spiral  bevel  pinion  gear 
mesh  frequency,  which  is  1109  Hz.  Moreover,  the  pinion  bearing  theoretical  defect 
frequency  is  at  311  Hz. 

3.  Analysis 

The  top  plots  in  Figure  1  show  the  raw  and  band  pass  filtered  time  series  corresponding 
to  the  no-fault  and  fault  conditions.  It  is  interesting  to  note  that  the  influence  of  the  fault 
is  to  enhance  the  modulation,  while  decreasing  the  peak  level  of  the  sinusoid.  Neither  of 
these  influences  is  to  be  expected,  given  the  nature  of  the  fault.  In  fact,  the  expected 
influence  would  be  an  increase  of  energy  at  the  bearing  defect  frequency;  which  in  this 
case  is  31 1  Hz.  But  the  PSD  plots  in  Figure  1  reveal  a  decrease  of  energy  at  that 
frequency,  and  an  increase  at  a  slightly  lower  frequency.  Other  than  this  very  low 
frequency  region,  the  only  significant  difference  due  to  the  fault  is  the  peak  values  at 
approximately  3150  Hz.  The  plots  of  the  filtered  data  in  this  region  show  that  there 
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appear  to  be  two  sinusoidal  components  in  this  region,  regardless  of  the  fault.  The  fault 
results  in  a  reduction  of  the  stronger  peak  on  the  order  of  15  dB, 
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Recall,  however,  that  our  goal  is  not  to  mathematically  decompose  a  real  sinusoid  into 
true  sinusoids,  but  rather  to  capture  its  time- varying  amplitude  and  frequency 
characteristics.  To  this  end,  we  proceed  now  to  investigate  the  convergence  properties  of 
the  MV(n)  and  AR(n)  spectra  as  n  goes  to  infinity.  In  [FFS]  it  was  shown  that  the  MV(n) 
spectra  converge  monotonically  to  the  line  spectrum  associated  with  sinusoids.  It  was 
also  shown  that  the  corresponding  AR(n)  spectra  converge  to  infinity  at  the  same 
frequencies,  while  converging  to  the  continuous  PSD  at  all  other  frequencies.  While  not 
shown  here  due  to  space  limitations,  the  MV(n)  and  AR(n)  estimated  spectra  for  the  no¬ 
fault  and  faulted  raw  data  clearly  suggest  the  presence  of  a  true  sinusoid  at 
approximately  3150  Hz.  The  same  cannot  be  said  of  the  spectra  for  the  fault  data.  Even 
though  the  MV(n)  spectra  do  not  exhibit  the  asymptotic  3  dB  drop  between  orders,  as 
predicted  in  [SHL],  they  also  do  not  suggest  convergence.  The  Corresponding  AR(n) 
spectra  exhibit  the  same  multiple  peak  structure  as  that  of  the  PSD  in  Figure  1.  So  it  is 
possible  that  it  is  this  increased  modulation  effect,  relative  to  the  no-fault  data,  that  is 
responsible  for  the  non-convergence  behavior  of  these  spectra. 

Since  the  spectra  use  lagged-product  correlation  estimates,  it  is  also  possible  that  the  lack 
of  exhibited  convergence  could  be  due,  in  part,  to  statistical  variability  associated  with 
these  estimates.  While  there  has  been  some  progress  in  obtaining  the  statistics  of  the 
AR(n)  [LS2]  and  MV(n)  [LIS]  correlation-based  estimates  for  mixed  spectrum  random 
processes,  our  analysis  here  will  not  consider  this  statistical  influence  due  to  the  risks  of 
distracting  the  reader  from  more  fundamental  issues,  and  of  a  very  lengthy  and  involved 
analysis.  Recall,  that  the  goal  of  this  paper  is  to  explore  both  the  nature  of  real  sinusoids 
and  the  tools  we  have  chosen  to  use  for  that  purpose.  We  believe  that  the  MV(n)  and 
AR(n)  spectral  families  have  significant  potential  for  that  purpose.  But  how  they  are  used 
is  equally  important.  For  example,  it  is  commonly  held  that  such  spectra,  for  high  enough 
orders,  can  capture  all  the  important  spectral  information  in  a  given  bandwidth  without 
the  need  for  filtering.  However,  when  attempting  to  take  advantage  of  the  convergence, 
as  opposed  to  modeling  capabilities  of  the  MV(n)  and  corresponding  AR(n)  spectra,  there 
is  a  definite  advantage  in  filtering.  In  particular,  for  the  no-fault  data  the  MV(n)  spectra 
were  noted  to  have  completely  converged,  as  the  n=40,  80  and  160  spectra  are  identical. 
This  strongly  suggests  the  existence  of  a  true  sinusoid.  However,  the  corresponding 
AR(n)  spectra  did  not  exhibit  the  corresponding  +3  dB  asymptotic  increase  per  order 
doubling.  Since  the  sinusoid  at  approximately  3150  Hz  is  not  a  true  sinusoid,  this 
contradictory  behavior  is  not  totally  unexpected;  especially  in  view  of  the  PSD 
information  in  Figure  1.  There,  it  is  observed  that  while  there  appear  to  be  two  closely 
spaced  sinusoids  for  both  the  no-fault  and  fault  data,  the  no-fault  data  has  one  very 
dominant  peak.  The  fact  that  the  MV  spectrum  has  lower  resolving  ability  than  the 
corresponding  AR  spectra  would  explain  this  contradictory  behavior.  Specifically,  at 
lower  orders  both  spectra  would  exhibit  behavior  consistent  with  a  single  sinusoid,  while 
at  higher  orders  the  AR  spectra  would  actually  reduce  in  magnitude  with  the  emerging 
presence  of  two  peaks.  This  behavior  of  these  spectra  for  the  fault  data  is  more  obvious. 
Since  both  the  PSD  in  Figure  1  and  the  AR  spectra  suggest  two  sinusoids  of  relatively 
equal  power,  it  would  appear  to  support  our  speculation  regarding  the  reason  for  the 
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contradictory  behavior  of  the  MV  and  AR  spectra.  It  is  our  belief  that  the  convergence 
properties  of  the  families  of  MV  and  AR  spectra  offer  significant  potential  for 
characterization  of  random  processes  involving  real  sinusoids.  However,  from  these 
results,  as  well  as  a  similar  contradictory  behavior  noted  in  their  use  in  the  analysis  of 
diesel  vibration  [SHW],  it  is  apparent  that  more  research  is  needed  to  better  understand 
their  joint  behavior  for  real  sinusoids. 

In  order  to  attempt  to  capture  the  time- varying  amplitude  and  frequency  behavior  of  the 
real  sinusoid  at  approximately  3150  Hz,  as  opposed  to  modeling  it  as  two  true  sinusoids, 
we  used  an  extended  Kalman  filter  (EKF),  similar  to  that  used  in  [SHW].  Specifically, 
the  band  pass  filtered  data  was  assumed  to  consist  of  a  single  sinusoid  plus  white  noise. 
The  sinusoid  amplitude  and  frequency  were  modeled  as  uncorrelated  random  walks.  The 
choice  of  this  model  is  based,  to  a  large  extent,  on  our  ignorance  of  their  true  behavior. 

As  will  be  seen,  however,  it  provides  a  sufficient  characterization  to  allow  more  realistic 
models  to  be  studied.  A  variety  of  model  covariance  values,  as  well  as  band  pass  filters 
were  investigated.  The  results  were  extremely  robust  with  respect  to  all  of  these. 
Furthermore,  for  both  the  no-fault  and  fault  data  the  EKF  model  captured  99.999%  of  the 
total  energy  in  the  data. 

The  time- varying  amplitudes  and  frequencies  for  the  no-fault  and  fault  data  are  illustrated 
in  Figure  2.  There  are  clear  differences  in  these  time  series  for  the  no-fault  versus  the 
fault  data.  For  example,  the  heavier  frequency  modulation  behavior  associated  with  the 
fault  data  results  in  temporal  regions  wherein  the  filtered  data  is  close  to  zero.  These 
regions  cause  the  EKF  to  lose  tracking  ability,  and  yield  frequency  estimates  which  are 
well  outside  of  the  actual  range  of  activity;  hence  the  very  large  excursions  in  estimated 
frequency.  These  excursions  are  periodic,  with  a  period  corresponding  to  the  modulation 
period  of  approximately  0.05  sec.  The  frequency  and  amplitude  histograms  for  the  no¬ 
fault  and  fault  data  are  shown  in  Figure  3.  The  frequency  histogram  for  the  fault  data  was 
truncated  in  order  to  alleviate  the  frequency  estimates  related  to  poor  EKF  tracking.  Both 
frequency  and  amplitude  information  is  dramatically  influenced  by  the  presence  of  the 
fault.  Recall  that  the  frequency  range  of  interest  here  has  no  known  relation  to  the 
characteristic  frequency  region  associated  with  such  a  fault.  Nonetheless,  such  strong 
differences  suggest  that  there  may  well  be  other  frequency  regions  of  equal,  if  not  greater 
ability  to  capture  the  presence  of  a  fault. 

Scatterplots  of  frequency  versus  amplitude  for  the  no-fault  and  fault  data  are  shown  in 
Figure  4.  Again,  there  is  a  distinct  difference  between  the  no-fault  and  fault  conditions. 
The  no-fault  condition  reveals  a  mild  negative  correlation  (-0.41)  between  amplitude  and 
frequency.  For  the  fault  condition  the  correlation  is  almost  zero  (0.07).  The  very  narrow 
range  of  frequencies  for  the  real  sinusoid  shown  in  Figure  4  results  in  the  very  peaked 
nature  of  the  fault  histogram.  Even  though  the  overall  correlation  between  frequency  and 
amplitude  information  is  only  modest  for  the  no-fault  data,  and  is  essentially  zero  for  the 
fault  data,  the  coherence  analysis  suggested  that  there  are  indeed  correlations  in  specific 
frequency  ranges. 
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To  further  explore  the  time  series  structure  of  the  frequency  and  amplitude  estimates 
provided  by  the  EKF  then  MV(n)  and  AR(n)  tools  used  to  analyze  the  measurement  data 
were  applied.  One  notable  result  indicated  in  the  MV  spectra  is  the  presence  of  a  strong 
sinusoidal  component  (0  dB)  in  the  no-fault  frequency  time  series  at  twice  the  frequency 
being  studied.  This  component,  shown  in  figure  5,  is  essentially  absent  (-  90  dB)  for  the 
fault  condition.  In  this  same  time  series  the  AR  spectra  indicate  a  very  clear  difference  in 
the  continuous  spectral  structure  between  the  no-fault  and  fault  conditions.  In  particular, 
the  no-fault  frequency  time  series  exhibits  an  AR(2)  shape,  with  a  spectral  peaJc  around 
2000  Hz.  For  the  fault  condition  it  becomes  an  AR(1)  shape,  and  increases  by  20  dB  at 
lower  frequencies.  A  closer  look  at  the  low  frequency  behavior  of  these  spectra  of 
required  that  the  frequency  and  amplitude  time  series  be  decimated  in  order  to  take 
advantage  of  the  convergence  properties  of  the  MV  and  AR  spectra  (c.f.  [SHL]  for  a 
detailed  discussion).  These  data  were  decimated  by  a  factor  of  10.  Application  of  the  MV 
and  AR  tools  to  this  data  revealed  that  for  the  no-fault  condition  the  20  Hz  modulation 
behavior  discussed  above  is  revealed  as  strong  sinusoidal  components  at  this  same 
frequency  in  both  the  frequency  and  amplitude  MV  (and  AR)  spectra.  There  is  no 
evidence  whatsoever  of  such  a  periodicity  in  the  frequency  time  series  for  the  fault  data.; 
even  though  the  amplitude  time  series  for  both  the  no-fault  and  fault  conditions  retains  a 
20  Hz  periodicity. 


5.  Summary  and  Conclusions 

The  purpose  of  this  effort  was  to  investigate  the  potential  of  a  combination  of  signal 
processing  and  basic  statistical  tools  for  characterization  of  real  sinusoids.  This  was  done 
in  the  context  of  sinusoids  associated  with  no-fault  and  fault  vibration  data  from  a 
military  helicopter  power  system.  The  fault  addressed  was  a  pinion  bearing  fault  having  a 
characteristic  fault  frequency  of  31 1  Hz.  Rather  than  investigating  this  frequency  region 
it  was  decided  to  investigate  the  region  around  3150  Hz.  In  that  region  it  was  observed 
that  not  only  were  strong  sinusoids  present,  but  that  the  bearing  fault  had  a  significant 
influence  of  the  data  structure.  For  both  the  no-fault  and  fault  conditions  it  was  noted  that 
two  sinusoidal  components  spaced  20  Hz  apart  were  present,  but  that  the  fault  resulted  in 
a  significant  attenuation  of  one  of  the  two.  The  3150  Hz  frequency  happens  to  be  a  gear 
mesh  frequency.  Because  it  is  also  the  10*  harmonic  of  the  fault  frequency  it  is  possible 
that  it  is  the  10*  harmonic  which  is  responsible  for  the  change  in  the  spectral  structure  in 
this  region.  However,  since  it  is  a  relatively  high  harmonic,  and  since  the  change  was  so 
significant,  one  might  posit  that  there  are  other  influences  of  the  fault  on  the  power 
system  vibration  characteristics.  While  there  is  no  direct  support  here  for  such 
speculation,  the  investigation  resulted  in  a  number  of  very  significant  findings.  First,  it 
was  found  that  a  model  for  a  single  real  sinusoid,  having  time- varying  amplitude  and 
frequency,  was  able  to  completely  characterize  the  band  pass  data  including  the  two 
theoretical  sinusoidal  components.  Analysis  of  the  amplitude  and  frequency  time  series 
provided  some  novel  insight  into  the  real  sinusoid  that  goes  well  beyond  a  simplistic  two- 
sinusoid  model  suggested  by  traditional  spectral  analysis.  For  example,  it  was  noted  that 
even  though  the  amplitude  and  frequency  of  the  real  sinusoid  associated  with  the  fault 
data  was  close  to  zero,  there  was  a  strong  correlation  in  specific  frequency  regions.  It  was 
also  found  that  the  frequency  and  amplitude  data  both  had  a  strong  periodic  component  at 


177 


6 


approximately  two  times  the  frequency  of  interest,  regardless  of  machine  condition. 
While  the  same  was  true  for  the  amplitude  time  series  in  the  range  corresponding  to  the 
fault  frequency,  for  the  frequency  time  series  the  presence  of  the  fault  aU  but  eliminated 
any  periodic  behavior.  Finally,  both  histogram  and  scatter  plot  information  revealed  such 
distinct  differences  between  the  no-fault  and  fault  conditions  that  discernment  of  these 
two  conditions  based  on  this  information  would  be  trivial.  It  is  not  the  intent  to  suggest 
that  information  related  to  a  single  real  sinusoid  should  be  used  instead  of  classic  fault 
frequency  information.  However,  our  analysis  suggests  that  a  fault  may  influence  data  in 
a  far  greater  way  than  has  been  considered  to  date. 
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Time  Histories  of  Accel,  #6  No-Fault  Raw  &  Filtered  Data 


Time  Histories  of  Accel.  #6  Pinion  Brg.  Fault  Raw  &  Filtered  Data 
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PSDs  of  No-Fault  &  pinion  Brg.  Fault  Raw  Data 


PSDs  of  No-Faull  &  pinion  Brg.  Fault  Filtered  Data 
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Figure  1.  Plots  of  raw  and  filtaed  no-fault  (left)  and  fault  (right)  data  (Top);  PSD  estimates  of  the  above 
data  (Bottom). 
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EKF  Estimate  of  Frequency  for  Accel.  *6  Pinion  Brg.  Fault  Filtered  Data 
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EKF  Estimate  of  Amplitude  for  Accel.  *6  No-Fault  Filtered  Data 
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Figure  2.  EKF  frequency  (left)  and  amplitude  (right)  estimates  for  filtered  no-fault  (top)  and  pinion  bearing 
fault  (bottom)  filtered  data. 
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Histogram  of  EKF  Frequency  Estimate  tor  Acell.  Filtered  No-Fault  Data 


Histogram  of  EKF  Amplitudo  Estimate  for  Accel.  #6  Filtered  No-Fault  Data 
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Figure  3.  Histograms  of  frequency  (left)  and  amplitude  (right)  estimates  for  no-fault  (top)  and  fault 
(bottom)  data 


Scatler  Plot  of  EKF  Frequency  vs.  Amplitude  for  No-Fautt  Data 


Scatter  Plot  of  EKF  Frequency  vs,  Amplitude  for  Pinion  Brg.  Fault  Data 
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F^ure  4.  Amplitude  va'sus  frequency  scatterplots  for  the  filtered  no-fault  (left)  and  fault  (right)  data. 
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Figure  5.  AR(n)  spectra  for  estimated  time-varying  frequency  fo  no-fault  (left)  &  fault  (right)  data. 
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It  is  difficnlt  to  track  change  in  telecommunication  networks  when  the 
logical  structnre  is  dynamic.  Measures  oF  network  difference  are  required 
to  indicate  significant  changes  oF  concern.  This  paper  discusses  possible 
approaches  towards  developing  suitable  measures  oF  change. 


'Key  Words:  network  management,  change  detection,  graph  matching 


1.  INTRODUCTION 

There  are  various  problems  requiring  the  ability  to  measure  network  change.  In 
the  management  and  control  oF  dynamic  telecommunication  systems,  the  early  de¬ 
tection  oF  significant  network  events  and  abnormal  trends  is  an  important  network 
perFormance  monitoring  capability  providing  advance  warning  oF  possible  Fault  con¬ 
ditions  [19],  or  at  least  assisting  with  the  identification  oF  causes  and  locations  oF 
known  problems.  Such  capabilities  are  increasingly  necessary  as  telecommunica¬ 
tion  networks  become  more  complex  and  dynamic,  for  example  large  heterogeneous 
enterprise  networks  [3]. 

Network  perFormance  monitoring  is  typically  undertaken  using  statistical  tech¬ 
niques  to  analyse  variations  in  traffic  distribution  [9,  11]  or  changes  in  topology 
[20].  Network  visualisation  techniques  are  also  used  to  monitor  changes  in  telecom¬ 
munication  networks  [1].  A  useful  complement  to  these  approaches  is  a  measure  oF 
change  in  the  network,  capturing  both  topology  and  traffic  flow,  to  highlight  when 
and  where  in  the  network  significant  events  may  be  occurring  [18].  Other  network 
management  tools  can  then  be  Focussed  on  likely  problem  regions  oF  the  network  for 
further  analysis.  This  paper  provides  an  overview  of  approaches  being  investigated 
to  identify  a  suitable  measure  oF  network  change  For  such  applications. 

A  measure  of  network  change  can  be  determined  by  representing  a  given  network, 
observed  at  time  t,  by  a  directed  graph  (digraph).  Edge  direction  can  be  used  to 
indicate  the  direction  oF  traffic  How  between  two  adjacent  nodes  in  the  network  with 
an  edge  label  representing  traffic  How.  A  second  graph  can  be  used  to  represent  the 
same  network  at  a  later  time  t  -k  At,  where  At  is  some  arbitrary  time  interval.  This 
second  graph  can  be  compared  with  the  original  graph  using  a  measure  of  network 

1 


181 


2 


P.J.  SHOUBRIDGE,  M.  KRAETZL,  ET.  AL. 


dffierence  between  the  two  graphs  to  indicate  the  degree  of  change  occurring  in 
the  network  over  the  time  interval  At.  By  continuing  network  observations  over 
subsequent  time  intervals,  the  graph  difference  measures  provide  a  trend  of  the 
network’s  dynamic  behaviour  as  it  evolves  over  time. 

The  problem  then  becomes  one  of  finding  good  graph  distance  measures  that  are 
sensitive  to  significant  change  events  but  insensitive  to  typical  variations  in  network 
topology  or  flow.  Following  from  this,  is  the  requirement  to  detect  significant  events 
given  a  suitable  distance  measure.  The  detection  problem  is  not  being  considered  in 
this  paper.  In  addition  to  a  graph  distance  measure,  it  is  also  necessary  to  readily 
identify  where  in  the  network  significant  change  events  have  occurred.  This  requires 
the  association  of  location  with  measured  change. 

2.  GRAPH  MATCHING 

The  communications  network  being  considered  is  represented  as  a  graph.  A  graph 
consists  of  a  set  of  vertices  representing  network  nodes,  and  a  set  of  edges  which  are 
ordered  pairs  of  vertices;  a  pair  of  vertices  denote  the  endpoints  of  an  edge.  Two 
vertices  are  adjacent  if  they  are  the  endpoints  of  some  edge. 

Definition  2.1.  A  directed  nnd  labelled  yraph  G  is  a  6-tnple  U  =  (F,  E,  Zv,1ie, 
ji,  v)  where: 

•  F  is  a  finite  set  of  primitive  objects  called  vertices; 

•  Lv  is  a  finite  set  of  labels  (for  the  vertices); 

•  p  :  F  Xv  is  a  function  assigning  labels  to  vertices; 

•  E  C  F  5<  F  is  the  set  of  edpes; 

•  is  a  finite  set  of  labels  (for  the  edges); 

•  V  :  B  ^  Le  is  &  function  assigning  labels  to  edges. 

Edges  axe  directed  with  the  vertex  pair  {i,j)  G  E  denoting  traffic  How  from  node 
i  to  node  j  in  the  network.  Vertex  labels  p  are  used  to  uniquely  identify  network 
nodes  while  edge  labels  v  are  assigned  traffic  flow  parameters  measured  over  some 
time  interval  At.  The  number  of  vertices  in  G  is  denoted  |F|  and  likewise  the 
number  of  edges  |E|. 

Two  graphs  can  be  considered  the  same  if  a  graph  isomorphism  exists  between 
them  [2].  Graph  isomorphisms  can  be  detected  by  mapping  the  vertices  of  one 
graph  Gi  onto  the  vertices  of  a  second  graph  G2.  A  valid  vertex  mapping  is  found 
if  the  edge  structure  of  Gi  is  preserved  in  G2  by  the  mapping.  If  all  the  vertices  of 
Gi  can  successfully  be  mapped  to  all  the  vertices  of  G2,  a  graph  isomorphism  has 
been  found  [7]. 

A  technique  known  as  error  correcting  graph  matching  (ecgm)  can  be  used  to 
measure  the  distance  or  dissimilarity  between  two  graphs.  Unlike  strict  graph 
isomorphism  detection,  this  approach  enables  inexact  matching.  Error  correct¬ 
ing  graph  matching  evaluates  the  minimum  number  of  edit  operations  required  to 
modify  an  input  graph  such  that  it  becomes  a  graph  isomorphism  of  some  reference 
graph.  This  can  include  the  possible  insertion  and  deletion  of  edges  and  vertices,  as 
well  as  possible  label  substitutions  [15].  Generally,  error  correcting  graph  matching 
algorithms  assign  costs  to  each  of  the  edit  operations  and  use  efficient  tree  search 
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techniques  to  identify  the  sequence  of  edit  operations  resulting  in  the  lowest  total 
edit  cost  [5,  13].  The  resultant  lowest  total  edit  cost  is  a  measure  of  the  distance 
between  the  two  graphs. 

Consider  first  changes  in  network  structure  or  topology  only.  In  general  graph 
matching  problems  there  exist  more  than  one  possible  sequence  of  edit  operations 
due  to  the  occurrence  of  multiple  possible  vertex  mappings.  The  ecgm  algorithms 
search  for  the  edit  sequence  that  results  in  the  minimum  edit  cost.  However,  with 
performance  monitoring  of  a  communications  network,  vertex  label  substitution 
is  not  a  possible  edit  operation  because  vertex  labels  reference  unique  physical  or 
logical  nodes  within  a  network.  As  a  result,  the  combinatorial  search  reduces  to 
the  simple  identification  of  elements  (vertices  and  edges)  inserted  or  deleted  from 
one  graph  Gi  to  produce  the  other  graph  Gg. 

If  the  cost  associated  with  the  insertion  and  deletion  of  individual  elements  is  1, 
the  edit  sequence  cost  becomes  the  difference  between  the  total  number  of  elements 
in  both  graphs,  and  all  graph  elements  in  common: 

Definition  2.2.  Let  the  graph  Gi  =  (Fi,Ei,pi,Pi)  represent  the  communica¬ 
tion  network  operating  at  time  fy,  and  let  G2  =  (F2,E2,P2,P2)  describe  the  same 
network  at  time  t2,  where  t2  =  ti  +  At.  The  network  edit  distance  d(Gi,G2)  is: 


d{UuU2)  =  |Fi|  +  IF2I  -  2|Fi  nFal  -I-  |Ei|  -I-  IE2I  -  2|Ei  n  E2I  (1) 


Clearly  the  edit  distance,  as  a  measure  of  topology  change,  increases  with  increas¬ 
ing  degree  of  change  experienced  by  the  network  over  At.  Edit  distance  d(Gi,G2) 
is  bounded  below  by  d(Gi,G2)  =  0  when  G2  and  Gi  are  isomorphic  (i.e.  there  is 
no  change),  and  above  by  d(Gi,G2)  =  |Fi|  -t-  IF2I  |jEi|  -f-  IE2I  when  Gi  n  G2  =  0, 
the  case  where  the  networks  are  completely  different. 

The  expression  for  graph  edit  distance  provides  a  measure  of  difference  between 
two  graphs  in  terms  of  topology.  While  traffic  flow  represented  as  edge  labels  v 
can  also  be  incorporated  into  an  ecgm  algorithm,  it  is  difficult  to  design  a  suitable 
cost  function  that  satisfies  both  topology  and  traffic  variations.  For  example,  what 
is  the  relationship  between  traffic  fluctuations  of  an  order  of  magnitude  compared 
to  the  insertion  of  a  new  vertex?  Alternatively,  could  an  integer  representation 
of  traffic  flow  be  successfully  mapped  to  a  topology  problem  where  the  number 
of  multi-edges  indicate  the  volume  of  traffic  flow?  Such  issues  are  currently  being 
investigated. 


3.  GRAPH  STRUCTURES 

Distance  measures  between  graphs  have  also  been  proposed  using  graph  matching 
techniques  that  focus  on  underlying  structures  or  properties  within  the  graphs.  For 
example,  finding  the  maximum  common  subgraph  of  two  graphs  is  an  indication  of 
the  commonality  between  graphs  [10,  12].  These  techniques  tend  to  be  based  on 
subgraph  isomorphism  algorithms  [14,  21]. 

An  induced  subgraph,  where  common  vertices  must  have  all  incident  edges  in 
both  graphs,  can  be  defined  as  follows: 
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Definition  3.1.  The  induced  subgraph  F  of  a  digraph  U  —  (F,  E,  p,  v)  generated 
by  the  subset  TF  of  F  is  the  digraph  U  =  (W, lEw^TiWi  ^w)  where: 

•  pw  ■  fF  Tjv  is  the  restriction  of  p  to  PF; 

•  =  E  n  TF  X  IF ; 

•  vw  ■  Ljb  is  the  restriction  of  v  to  Ew- 

A  distance  metric  has  been  defined  based  on  the  determination  of  the  maximal 
common  subgraph  of  two  graphs  [6]: 


_  |mcis(gi,G2)| 
max{|Gi|,|G2|} 


(2) 


where  mcis(??i ,  (j2)  denotes  the  maximal  common  induced  subgraph  of  Gi  and  U2 , 
and  |G1  denotes  the  number  of  vertices  in  the  graph  G.  There  is  no  reason  why  the 
number  of  edges  could  not  be  used  as  |f?|.  The  resulting  distance  measure  is  still  a 
metric  and  may  prove  useful. 

It  is  interesting  to  note  that  this  metric  is  related  to  error  correcting  graph 
matching  [4]. 

The  most  general  form  of  the  distance  metric  (2)  is  given  by: 


where  m{Gi,G2)  is  a  measure  of  similarity  between  Gi  and  G2,  and  M(??i,Gr2)  is 
a  measure  of  the  size  of  the  problem. 

The  size  of  the  problem  may  also  be  defined  as  the  number  of  vertices  in  the 
union  of  the  two  graphs.  This  resulting  distance  measure: 


d(Gi,G2)  —  1  — 


[mcis(gi,g2)| 

|Gi  U  G2I 


(^) 


can  also  be  shown  to  be  a  metric. 

In  the  telecommunications  application,  it  seems  natural  to  view  the  size  of  the 
problem  as  the  union  of  the  two  graphs.  Using  the  union  rather  than  the  larger  of 
the  two  graphs  distinguishes  variations  in  the  size  of  the  smaller  of  the  two  graphs. 
If  only  the  size  of  the  larger  graph  is  used  to  represent  problem  size,  the  distance 
between  graphs  will  remain  unchanged  even  if  the  smaller  graph  changes  its  size, 
assuming  that  the  size  of  the  maximum  common  subgraph  remains  constant.  This 
latter  metric  may  provide  a  more  accurate  measure  of  the  relative  graph  difference. 

There  are  other  approaches  that  may  prove  useful  in  the  application  of  network 
change  measurement.  Instead  of  measuring  change  by  concentrating  on  network 
elements,  as  in  the  cases  for  ecgm  and  mcis,  better  indicators  of  significant  change 
could  perhaps  be  obtained  by  examining  higher  level  structures  within  the  network. 
One  such  structure  being  investigated  is  that  based  on  vertex  neighbourhoods. 

An  input  graph  Gi  is  used  to  generate  a  neighbourhood  graph  Ni  that  contains 
an  undirected  edge  between  two  vertices  if  the  same  vertices  in  Gi  share  one  or 
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more  common  adjacent  vertices,  i.e.  are  connected  to  common  neighbours.  Sim¬ 
ilarly  a  graph  7^2  can  be  generated  from  ??2-  The  distance  between  Ui  and  U2 
is  now  derived  from  a  graph  matching  distance  measure  taken  between  the  two 
neighbourhood  graphs  Tfi  and  7^2-  If  for  example  edges  are  deleted  in  7fi  or  7^2, 
they  have  fax  greater  significance  because  two  underlying  vertices  in  Ui  or  G2  axe 
now  no  longer  connected  via  common  neighbours.  It  is  unclear  at  this  time  what 
advantages  such  an  approach  might  have,  but  it  is  expected  to  improve  indications 
of  significant  change  in  the  underlying  network. 

In  addition  to  these  techniques,  the  analysis  of  graph  adjacency  matrices  has 
been  shown  to  yield  useful  distance  metrics  between  graphs.  The  adjacency  matrix 
of  a  graph  G  is  defined  as: 

Definition  3.2.  The  adjamncy  matrix  A  =  [atj]  of  graph  G  is  a  |F|  x  |F|  matrix 
where 


Qi,  — 


{ 


V  if  edge  (i,  j)  G  E 
0  otherwise 


If  the  two  graphs  to  be  compared  are  uniquely  labelled,  it  is  possible  to  measure 
the  difference  between  the  graphs  using  a  Hamming  distance  measure  [16].  This 
metric  defines  the  distance  between  the  two  graphs  as  the  number  of  elements 
in  which  their  respective  adjacency  matrices  differ.  This  approach  is  somewhat 
similar  to  the  basic  ecgm  distance  measure  shown  in  equation  (1).  Analysis  of 
graph  adjacency  matrices  using  spectral  graph  theory  [8]  have  also  been  shown  to 
provide  useful  graph  distance  measures  [17]. 


4.  LOCATION  OF  CHANGE 

Consider  the  scenario  where  a  graph  distance  measure  has  indicated  that  two 
digraphs  Gy  and  G2  have  significantly  different  network  topologies  over  a  single 
time  interval  At.  Of  particular  interest  is  the  distribution  of  this  change  during  the 
transition  from  Gy  to  G2.  The  following  technique  ranks  all  vertices  within  the  two 
digraphs  {jGy  U  G2)  in  increasing  order  of  the  number  of  network  topology  change 
events  experienced  by  individual  vertices. 

Differences  between  digraphs  Gy  and  G2  axe  represented  by  a  change  matrix  G 
that  indicates  where  edges  have  been  deleted  from  Gy  or  inserted  into  G2.  The 
matrix  G  has  a  row  and  column  for  every  vertex  contained  within  the  two  graphs. 
The  existence  of  a  directed  edge  deleted  from  Gy  (or  inserted  into  G2)  is  represented 
in  the  matrix  G  =  [cy]  by  the  corresponding  row  column  entry  Cjj  =1.  Indices  i 
and  j  denote  the  respective  source  and  destination  vertices  of  the  deleted  or  inserted 
directed  edge.  Any  edges  (i,j)  that  remain  incident  to  the  same  pair  of  vertices 
in  both  Gi  and  G2  result  in  the  corresponding  entry  Cjj  =  0,  indicating  that  no 
change  has  occurred.  For  all  other  entries  cjj  =  0. 

If  a  permutation  of  the  matrix  G  is  found  such  that  the  row  sums  and  column 
sums  of  G  are  arranged  in  ascending  order,  entries  in  G  where  cg  =  1  will  tend 
to  occupy  a  lower  right  block  of  G.  The  corresponding  row  and  column  vertices 
will  be  ranked  in  ascending  order  of  the  number  of  incident  edge  deletion  and/or 
insertion  operations,  with  the  last  row  and  column  vertex  pair  denoting  the  vertices 
that  experienced  the  greatest  change  in  the  transition  from  graph  Gi  to  G2. 
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5.  CONCLUSION 

Several  techniques  have  been  presented  that  could  successfully  measure  the  de¬ 
gree  of  change  occurring  within  a  communications  network.  Preliminary  simulation 
investigations  into  the  use  of  ecgm  and  mcis  distance  metrics  indicate  that  effec¬ 
tive  change  measurement  is  feasible.  It  is  expected  that  applying  ecgm  or  mcis 
distance  metrics  to  graphs  that  represent  higher  level  structures  within  the  un¬ 
derlying  network  (e.g.  common  neighbours)  should  in  general  produce  improved 
results.  However,  this  is  still  to  be  confirmed. 
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This  paper  considers  a  situation  in  which  a  detection  system  is  configurable  in 
such  a  way  as  to  provide  multiple  modes  of  operation  that  differ  in  their  detection 
performance  and  geographical  coverage.  A  technique  for  optimal  mode  selection 
based  upon  minimizing  Bayesian  risk  is  formulated  and  demonstrated  for  the  case 
of  a  two-mode  system. 


0.  PROLOGUE 

In  the  first  Joint  Australia/U.S.  Workshop  on  Defence  Signal  Processing,  held  in  Adelaide 
in  1997,  two  of  the  authors  (Sinno  and  Cochran)  presented  a  paper  involving  estimation 
using  a  configurable  sensor  system  [6].  During  the  Workshop,  Dr.  Paul  Miller  of  the 
Australian  Defence  Science  and  Technology  Organisation  told  us  about  a  real-world  sce¬ 
nario  in  which  searches  for  ground  vehicles  are  carried  out  over  vast  uninhabited  areas  by 
helicopters  outfitted  with  dual-mode  radar  systems.  We  understood  the  operating  modes 
of  the  radar  system  to  be  such  that  they  could  be  loosely  described  as  “broad  search”  and 
“focused”  modes  and  that  the  strategy  for  switching  between  modes  during  a  search  was 
left  to  the  helicopter  crew. 

We  subsequently  proposed  a  mathematical  model  of  this  kind  of  scenario  and  a  Bayesian 
approach  to  choosing  mode  switching  strategies  [  1  ] .  This  formulation  made  use  of  a  payoff 
function  consisting  of  two  terms,  one  of  which  captures  the  performance  of  the  sensing 
strategy  in  detecting  the  presence  of  a  target  in  the  search  area  and  the  other  measuring  its 
effectiveness  at  localizing  the  target. 

When  we  were  invited  to  participate  in  the  second  AustraliaAJ.S.  Workshop,  revisiting 
this  problem  and  exploring  an  alternative  approach  that  addresses  a  shortcoming  of  our 
technique  in  [1]  seemed  a  natural  way  to  connect  our  contribution  with  the  previous 
Workshop  and  the  many  fine  technical  interactions  it  seeded.  The  approach  in  this  paper  is 
based  upon  Bayesian  risk  analysis  and  it  eliminates  concerns  regarding  correlation  between 
the  terms  of  the  payoff  function  arising  in  our  previous  treatment  of  the  problem. 

1.  INTRODUCTION 

This  paper  considers  a  situation  in  which  a  detection  system  is  configurable  in  such  a 
way  as  to  provide  multiple  modes  of  operation  that  differ  in  their  detection  performance  and 
geographical  coverage.  The  development  that  follows  focuses  on  the  case  of  a  detector  with 
two  operating  modes:  a  “broad  search”  mode  that  provides  wide  coverage  and  a  “focused” 
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mode  that  provides  better  detection  performance  but  covers  less  area.  The  system  is  invoked 
in  a  sequence  of  tests  to  detect  and  localize  a  target  within  a  framework  that  is  formulated 
precisely  in  the  following  section. 

As  noted  in  the  Prologue,  this  problem  was  addressed  in  an  earlier  paper  [1]  using  a  cost 
functional  approach  in  conjunction  with  a  Bayesian  method  for  incorporating  the  results  of 
earlier  tests  in  deciding  which  mode  to  use  in  each  test.  The  approach  presented  here  uses  a 
more  classical  approach,  based  on  minimization  of  Bayesian  risk,  that  allows  more  precise 
designation  of  the  priority  of  correct  detection  relative  to  that  of  correct  localization. 

2.  PROBLEM  SETUP 

The  situation  described  above  is  modeled  as  follows.  The  entire  region  of  interest  C  is 
partitioned  into  N  disjoint  cells  Ci, Cn-  Operating  in  the  broad  search  mode  {Mode 

A) ,  the  detector  tests  for  the  presence  of  a  signal  source  in  C.  In  the  focused  mode  {Mode 

B) ,  however,  the  test  may  be  applied  to  exactly  one  cell  (7„. 

To  account  for  difference  in  detector  performance  in  the  two  operating  modes,  detector 
performance  is  modeled  as  arising  from  the  problem  of  detecting  a  known  signal  in  white 
gaussian  noise  of  known  variance.  This  model  provides  a  well  understood  solution  (i.e., 
the  matched  filter)  in  each  test,  admits  several  straightforward  generalizations,  and  allows 
detection  performance  in  Mode  B  to  be  distinguished  from  that  in  Mode  A  by  simply  raising 
the  signal-to-noise  ratio  (SNR).  More  specifically,  in  each  mode  of  operation  the  detector 
encounters  a  problem  of  the  form 


Fo  :  X  =  N  (1) 

Hi  :  X  =  5  +  N 


where  5  is  a  known  signal  M-vector  with  energy  ||S'|p  =  1  and  N  is  a  zero-mean  white 
gaussian  M-vector  having  known  variance  cr^;  i.e.,  N  ~  A(’[0,  crH]  where  I  is  the  n  x  n 
identity  matrix.  Since  ||5||  is  fixed,  the  SNR  (and  hence  the  performance  of  the  detector) 
in  each  mode  can  be  adjusted  by  varying  a^. 

Assuming  at  most  one  signal  source  is  present,  denote  by  Hi  and  Ho  the  events  that 
the  signal  source  is,  respectively,  present  in  and  absent  from  C.  Let  ho  =  Ho  and,  for 
n=l,...,N,  denote  by  hn  the  event  that  the  signal  source  is  present  in  cell  (7„.  With  these 
definitions.  Hi  =  Regardless  of  whether  it  is  operating  in  Mode  A  or  Mode  B, 

the  system  yields  a  decision  with  0  <n<  N. 

Recall  that  the  optimal  solution,  in  terms  of  minimal  probability  of  error,  to  a  detector 
problem  of  the  form  (1)  is  a  test  on  the  inner  product  5^X  where  the  detection  threshold 
is  a  function  of  the  a  priori  probability  that  a  signal  is  present  [2,  5].  The  probabilities 
of  detection  and  false  alarm  for  each  test  are  given  by  error  functions  of  the  detection 
thresholds.  In  particular,  the  tests  applied  in  both  operating  modes  will  be  of  this  form,  but 
their  detection  thresholds  and  probabilities  of  detection  and  false  alarm  will  all  be  different 
(even  when  Mode  B  is  applied  to  different  cells)  because  of  their  dependence  on  Pr(/i„), 
n  =  0, 


3.  A  BAYESIAN  RISK  FORMULATION 
Using  the  notation  of  [5],  define  a  random  “state  of  nature”  parameter  by  0  =  n  if 
hn  is  true,  n  =  0, 1, N.  A  prior  distribution  for  6  is  assumed  and  a  test  (i.e.,  a  Mode 
A  test  or  a  Mode  B  test  on  a  particular  cell  C7„)  is  chosen  and  performed  yielding  a  binary 
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N+1  possible  tests 

1 

Choose  test  that  minimizes  overall 
Bayesian  risk  (based  on  a  loss  function) 

1 

Get  measurement 
Set  detector  threshold  to  minimize 
local  error.  Deduce  an  outcome: 
detect  or  no  detect 


Use  outcome  to 
i^tdate  priors 


Generate  MAP  decision 


FIG.  1.  Detection/localization  algorithm. 


outcome  o.  If  Pr(J?o|o)  >  |o),  the  system  decides  for  Hq.  Otherwise,  the  system 

decides  in  favor  of  the  hypothesis  hn  having  the  largest  posterior  probability  Pr(/i„|o); 
i.e.,  in  this  case  the  system  decision  rule  <f>  takes  the  value  n  if  hn  has  the  largest  posterior 
probability.  As  shown  in  [1],  these  posterior  probabilities  are  straightforward  to  compute 
using  the  detection  and  false  alarm  probabilities  of  the  chosen  test,  which  follow  from  the 
(prior)  distribution  of  0.  The  overall  algorithm  is  depicted  schematically  in  FIG.  1.  Note 
that,  once  a  test  is  selected,  the  rule  ^  for  choosing  a  hypothesis  hn  based  on  the  test’s 
outcome  is  well  defined. 

The  approach  to  mode  (i.e.,  test)  selection  is  to  choose  the  one  that  minimizes  Bayes  risk 
with  respect  to  a  pre-defined  loss  functional.  Since  the  overall  goal  of  the  system  is  to  both 
detect  the  signal  source  and  localize  it,  with  these  two  subgoals  possibly  being  of  unequal 
importance,  a  loss  functional  of  the  following  form  is  used; 


'  0  d  =  (t> 

1  d  ^  <f>,  (f>  0,  and  9  ytO 

Cl  9  (f>  and  0  =  0 

,  C2  0  7^  0  and  (^  =  0 


With  Cl  >  1,  this  functional  imposes  a  greater  penalty  for  a  false  alarm  (i.e.,  deciding  in 
favor  of  Hi  when  Ho  is  true)  than  for  correct  detection  with  incorrect  localization  (i.e..  Hi 
is  correctly  chosen,  but  the  wrong  cell  is  picked).  With  C2  >  Ci,  an  even  greater  penalty 
is  levied  if  the  system  decides  in  favor  of  Ho  when  a  target  is  actually  present.  Depending 
on  the  application,  the  weights  Ci  and  C2  can  be  chosen  to  adjust  the  relative  importance  of 
detection  and  classification  in  an  intuitively  appealing  way. 
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With  this  loss  functional,  the  risk  is 


ci^  Pr(-?/i„|/io)  0  =  0 

n^O 

N 

C2  Pr(— 5/lo|/j<:)  +  ^  ^  Pr(— f/lnl/l*)  0^0 

n=l,n^fc 


and  the  Bayes  average  risk  of  the  decision  rule  (f>  is  thus 


N 

E{Jti0,<l>)\e]  =  Cl  Pr(/io)  Y1  Pr(^^n|/Jo) 

71=1 


N 


+J2^T{hn) 


N 

C2  Pr(— f/loj/ln)  +  ^  .  Pr(— i 

k=l,k^n 


Since  the  decision  rule  (f)  (and  hence  each  decision  -^hn)  depends  on  the  probabilities 
of  the  hypotheses  hk  posterior  to  the  test,  this  quantity  depends  on  which  test  (mode)  is 
selected. 

The  mode  is  chosen  to  minimize  the  conditional  expectation  of  the  Bayes  average  risk. 
The  posterior  probabilities  of  each  /i„  given  a  particular  test  and  its  outcome  can  be 
calculated  using  Bayes’  rule,  the  prior  probabilities  of  the  hypotheses,  and  the  detectors’ 
probabilities  of  detection  and  false  alarm.  These  calculations  are  given  explicitly  in  [1]. 
Once  a  mode  is  selected  and  the  test  is  performed,  the  decision  -4/i„  is  completely 
determined  by  its  outcome  o„j  €  {0, 1};  prior  to  performing  the  test,  the  only  uncertainty 
about  the  decision  arises  because  the  outcome  of  the  test  is  not  yet  known.  The  conditional 
expectation  ofthe  Bayes  averageriskforachosentestTm  is  7?oPr(om  =  0)+KiPr(om  = 
1)  where 


N 

Ho  =  E[ll{9,(f>)\e,Om  =  0]  =  Cl  Pr(/io|o„j  =  0)  Pr(-^/i„|/io,o,„  =  0) 

71=1 


N 

+  ^  ^  Pr(/ln|Om  =  0) 

n—1 


N 

C2  Pr(-f/lo|/ln>Om  =  0)  +  ^  Pr(-?/ll;  |/l„,  =  0)  j 

k=l,k^n 


and 


TV 


71=1 


El  =  E[lt{e,4>)\0,Om  =  1]  =  Cl  Pr(/io|o,„  =  1)  I]  Pr(-F/i„|/io,Om  =  1) 

TV 

+  ]^Pr(h„|o„  =  l) 


71=1 


C2  Pr(— ^/lol^n)  0»n  —  1)  +  ^  Pr(— | /in  j  Om  —  1) 

k=l,k^n 


The  complexity  of  these  expressions  for  Ro  and  Ei  belie  their  relatively  simple  nature. 
There  are  two  cases: 

•  Case  \\  (j>  =  0 


Ho  =  C2Pr(Ei|oTO  =  0) 

El  =  C2  Pr(ifi  [Om  =  1) 
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•  Case  2:  4>  =  k  >  0 

TZo  =  Cl  Pr(//o|om  =  0)  +  Pr(Fi|o^  =  0)[Pr(Hi|o„,  =  0)  -  Pr(/i*|o^  =  0)] 

Ri  =  Cl  Pr(^o|o„,  =  1)  +  Pr(i7i|o™  =  l)[Pr(i?ilo„  =  1)  -  Pr(/iA|o„  =  1)] 

The  conditional  probabilities  in  these  expressions  are  exactly  the  post-test  probabilities 
computed  in  [1];  the  probabilities  of  the  test  outcomes  are  computed  as  follows.  For  a 
Mode  A  test, 


Pr(o„  =  0)  =  (1  -  -Pd,A)  Pr(Fi)  -f  (1  -  -Pf^A) 

Pr(o„  =  1)  =  Ta,A  Pr(Ifi)  -t-  T},a  Pr(Fo) 

and  for  a  Mode  B  test, 

Pr(o„  =  0)  =  (1  -  Pr(/i„)  -^(1  -  pj”i)(l  -  Pr(/i„)) 

Pr(o„  =  l)  =  Pj3Pr(/i„)  +  pg(l-Pr(M) 

In  these  expressions,  Pj*^  and  are  the  probabilities  of  detection  and  false  alarm, 
respectively,  of  the  Mode  B  detector  used  on  cell  n.  Pd, a  and  P/,a  are  the  corresponding 
probabilities  for  the  Mode  A  detector. 

To  summarize,  the  decision  rule  ^  depends  on  the  outcome  of  the  test  and  the  posterior 
probabilities  of  /i„,  n  =  1, ...,  N.  These  can  be  computed  before  any  test  is  actually  run. 
Thus,  for  each  candidate  test,  the  expected  risk  may  be  calculated  using  Pr(om  =  0)  and 
Pr(om  =  1)  (which  come  from  the  detector  performance  figures)  before  running  any  tests. 
This  allows  the  selection  of  the  test  of  lowest  Bayes  risk,  as  proposed. 

4.  EXAMPLES 

The  following  two  examples  show  the  behavior  of  the  two-mode  detection/localization 
system  operating  in  a  five-cell  (i.e.,  N  =  5)  scenario.  The  test  signal  and  white  gaussian 
noise  vectors  are  of  length  M  =  10  and  the  SNRs  in  the  two  modes  are  —3.1  dB  and  -6.1 
dB.  The  cost  values  are  cj  =  1.2  and  ca  =  2.  In  the  first  example  (FIG.  2),  the  initial  prior 
probabilities  are  Pr(/ii)  =  .1472,  Pr(/i2)  =  .0749,  Pr(/i3)  =  .0935,  Pr(/i4)  =  .1178,  and 
Pr(/i5)  =  .0667.  The  posterior  probabilities  of  the  first  test  which  are  used  as  the  prior 
probabilities  in  the  second  test,  appear  in  the  first  column  of  the  grid  -  and  so  forth.  In 
this  example,  a  signal  source  is  actually  present  in  cell  4  (indicated  by  a  triangle  in  the 
upper  right  comer).  The  system  chooses  Mode  A  for  the  initial  test  (indicated  by  shading 
of  the  cells  in  the  first  column),  does  not  detect  (per  the  annotation  beneath  the  column), 
and  decides  for  Hq  (indicated  by  lack  of  highlighted  frame  around  any  cell).  Mode  A  is 
selected  again  in  the  second  test,  the  system  detects  and  chooses  cell  1.  Following  test  2, 
the  system  runs  in  Mode  B  on  cell  1,  does  not  detect  but  decides  for  cell  4  because  it  has 
the  highest  posterior  (and  pi  >  0.5).  In  test  4,  the  detector  runs  again  in  Mode  B  but  on 
cell  4,  detects  and  decides  for  cell  4. 

In  the  second  example  (FIG.  3),  the  initial  prior  probabilities  are  Pr(fti)  =  .2385, 
Pr(/i2)  =  .1006,Pr(/i3)  =  .1315,  Pr(/i4)  =  .0239,  and  Pr(/i5)  =  .0056.  In  this  example, 
a  signal  source  is  actually  present  in  cell  3.  It  is  interesting  to  note  how  the  system  switches 
from  Mode  B  back  to  Mode  A  in  test  4. 
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FIG.  2.  Behavior  of  the  two-mode  detector  in  a  five-cell  scenario  (Example  1). 
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FIG.  3.  Behavior  of  the  two-mode  detector  in  a  five-cell  scenario  (Example  2). 


5.  DISCUSSION  AND  CONCLUSIONS 
Since  beginning  this  work,  the  authors  have  become  aware  of  some  fine  research  on 
related  problems  involving  mode-switchable  sensors,  most  notably  by  K.  Kastella  and  his 
colleagues  (see,  e.g.,  [3, 4]). 

Work  currently  underway  is  examining  the  mean  time  to  correct  decision  of  the  approach 
presented  here  for  various  operating  parameters,  choice  of  detection  thresholds  for  the 
individual  tests  to  minimi/e  this  mean  time,  and  possible  applications  outside  the  context 
of  the  original  problem. 
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Abstract 

Technology  is  changing  the  way  in  which  telephone  service  can  best  be  offered.  Changes  in  regu¬ 
latory  policy  impact  the  way  in  which  telephone  service  is  allowed  to  be  offered.  The  capital  mar¬ 
ket's  growing  awareness  of  the  money  to  be  made  in  telecommunications  has  provided  the 
business  opportunity  to  build  new  and  alternative  systems.  The  combination  of  these  trends  will 
radically  alter  how  &e  defense  and  intelligence  communities  deal  widi  telephone  systems ,  both  as 
a  target  and  as  an  enabling  force. 

Introduction 

The  telephone  instrument  sitting  on  your  desk  at  work  or  at  home  is  closely  related  in  appearance 
and  in  electrical  design  to  the  telephone  set  of  100  years  ago.  From  this  it  might  be  reasonable  to 
presume  that  the  underlying  telephone  network  has  stayed  the  same  over  that  same  interval  and 
will  stay  the  same  in  the  future.  In  fact,  neither  of  these  presumptions  is  true.  Technology  is 
changing  the  way  in  which  telephone  service  can  best  be  offered.  Changes  in  national  and  interna¬ 
tional  regulatory  poUcy  impact  the  way  in  which  telephone  service  is  allowed  to  be  offered.  The 
capital  market's  growing  awareness  of  the  money  to  be  made  in  telecommunications  has  provided 
the  business  opportunity  to  bmld  new  systems,  often  using  alternative  technologies.  The  combina¬ 
tion  of  these  trends  will  radically  alter  how  the  defense  and  intelligence  communities  deal  with 
telephone  systems,  both  as  a  target  and  as  an  enabling  force. 

In  this  paper  we  examine  a  few  of  these  important  changes. 

Historical  Background 

To  understand  the  impact  of  these  technical,  financial,  and  regulatory  changes,  it  is  useful  to 
review  certain  aspects  of  the  history  of  telephony. 

1 .  Until  the  last  decade,  telephony  systems  focused  on  voice  traffic  and  adapted  all  other  types  of 
signals  they  carry  (e.g.,  television,  data  and  Fax)  to  conform  to  a  network  optimized  for  the 
transport  of  voice. 

2.  The  construction  of  telephone  systems  was  highly  capital-intensive,  with  most  of  the  cost 
being  concentrated  in  vdre  and  other  transmission  facilities.  As  a  result,  long-distance  tele¬ 
phony  was  expensive  and  usually  billed  proportionally  to  the  distance  covered  by  the  call  and 
its  duration. ^touaUy  all  telephone  systems  were  local  or  national  monopolies,  and,  other  than 
in  North  America,  most  were  government-owned,  govemment-run  organizations  known  as 
“post,  telephone,  and  telegraphs  (PTTs).”  They  were  “vertically  integrated”  in  that  the  FIT 
supplied  the  telephone  instrument  and  owned  all  of  the  assets  needed  to  provide  telephone  ser¬ 
vice. 
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3 ,  In  virtually  aU  countries,  the  services  needed  by  businesses,  such  as  long-distance  calling  and 
private  branch  exchanges  (PBXs),  were  substantially  overpriced  in  order  to  subsidize  residen¬ 
tial  and  rural  telephone  service. 

4.  In  most  countries  the  PTT  employed  many  people  and,  in  many  countries ,  was  an  important 
conduit  for  graft  to  those  running  the  government.  These  two  factors  have  historically  discour¬ 
aged  the  introduction  of  cost-saving  or  labor-reducing  technology  into  those  countries. 

The  realization  that  good  telecommunications  are  a  prerequisite  for  national  economic  develop¬ 
ment  has  encouraged  many  cotmtries  to  privatize  their  PTTs  or,  at  the  least,  permit  competition 
with  the  PIT.  The  rapid  improvement  in  the  technology  needed  to  build  telecommumcations  sys¬ 
tems  and  the  associated  falling  cost  has  facilitated  the  development  of  these  “parallel”  systems. 
The  opportunity  to  build  and  operate  such  systems  at  a  profit  has  attracted  a  large  amount  of  capi¬ 
tal  from  the  financial  marketplace.  The  combination  of  these  factors  is  causing  tremendous 
change  in  the  way  that  systems  are  designed,  built,  and  operated. 

The  ^T)eath  of  Distance” 

To  determine  the  most  economical  design  of  a  telephone  network  it  is  necessary  to  perform 
tradeoff  analysis  among  the  most  costly  components.  Traditionally  these  have  been  the  transmis¬ 
sion  segment  and  the  switching  segment.  Historically,  the  costs  of  transmission  were  high  and  the 
use  of  longer  lines  led  to  lower  signal  quality  owing  to  the  accumulation  of  noise  in  analog  sys¬ 
tems.  This  is  no  longer  true.  The  incredible  improvement  in  the  capability  of  optical  fiber  to  trans¬ 
port  telecommunications  signal  over  the  past  twenty  years  can  and  will  cause  the  complete 
reorganization  of  network  topologies. 

Consider  first  the  comparative  cost  trends  shown  in  Figure  1.  In  general  the  cost  of  computation 
and  packet  routing  has  fallen  over  time  at  roughly  the  same  rate  that  semiconductors  have 
improved  in  speed.  Circuit  switching  has  improved  somewhat  faster  but  the  most  significant 
improvement  is  that  in  fiber-based  transport.  Better  fiber,  faster  electronics,  and  the  advent  of 
wavelength  division  multiplexing  (WDM)  have  lowered  the  cost  of  hauling  a  bit  of  information  by 
several  orders  nf  magnitude  over  the  past  twenty  years.  Almost  as  important  is  that  the  signal's 
quality  has  become  nearly  independent  of  the  distance  over  which  it  is  hauled.  The  combination 
of  these  two,  the  virtual  removal  of  transport  cost  as  a  consideration  in  pricing  a  call,  and  the  near 
total  maintenance  of  signal  quality,  led  to  what  some  have  called  “the  death  of  distance”  as  a  con¬ 
cern  in  the  design  of  a  telephone  system  and  the  price  of  a  phone  call. 

The  impact  of  these  facts  on  the  topology  of  a  telephone  network  can  be  seen  in  Figure  2.  On  the 
left  is  a  network  designed  in  the  traditional  world  where  switching  costs  less  than  transmission.  In 
this  world  many  layers  of  switches  are  employed,  with  the  principal  design  goal  being  to  mini¬ 
mize,  on  average,  &e  distance  over  which  a  call  is  transported.  Minimizing  this  distance,  in  turn, 
minimizes  the  cost  of  transporting  the  caU  and  maximizes  its  quality. 

The  right  side  of  Figure  2  shows  the  effect  of  low-cost,  high-quality  transmission.  After  concen¬ 
trating  calls  locally  they  are  all  hauled  to  a  central  location  for  switching  and  distribution.  End-to- 
end  signal  quality  is  better  (compared  to  an  analog  transmission  system),  fewer  switches  are 
needed,  and  less  control  and  signaling  are  needed.  In  every  dimension  costs  are  lowered  and  ser¬ 
vice  is  improved. 
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Figure  1:  Comparative  Trends  in  the  Cost  of  Various  Segments  of  a  Telephone  Network 


Figure  2:  Cheaper  Transport  Will  Encourage  the  Design  of  Networks  with  Longer  Transmission  Links  and  Fewer 
Switching  Points 

The  economic  attractiveness  of  this  approach  has  already  encoxnaged  the  new  entrants  into  the 
telecom  marketplace  to  use  it,  and  the  presence  of  their  lower-cost  competition  has  encouraged 
the  established  organizations  to  emulate  it  as  best  they  can. 

This  type  of  network  architecture  has  a  number  of  defense  implications,  including  the  greater  dif¬ 
ficulty  of  defending  a  geographically  dispersed  network.  It  also  has  the  implication  that  a  call 
thought  by  the  caller  to  be  a  local  one  might,  in  fact,  travel  well  out  of  the  local  area  and  back. 

Retailers,  Wholesalers,  and  Resellers 

Until  the  middle  1970s  in  the  U.S.  and  much  more  recently  in  the  rest  of  the  world,  the  organiza¬ 
tions  which  ran  telephone  networks  were  “vertically  integrated,”  that  is,  they  provided  the  tele¬ 
phone  itself,  the  copper  loop  to  the  central  office,  the  central  office  and  its  staff,  and  all  of  the 
switching  and  transmission  equipment  associated  with  hauling  calls  from  one  central  office  to 
another.  Many  of  these  organizations  argued  that  there  were  cost  advantages  to  diis  integration 
while  others  insisted  that  the  quality  of  service  to  the  customers  could  only  be  insured  by  operat- 
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ing  their  system  as  an  integrated  whole.  Others  yet  claimed  that  national  sectirity  drove  the  need 
for  a  completely  government-owned,  and  therefore  “single-vendor”  system. 

Many  of  the  conditions  that  siq>ported  these  positions  are  now  gone.  Organizations  needing  to 
expand  or  improve  their  networks  have  become  increasingly  open  to  the  approach  of  leasing 
ra^er  than  owning  the  necessary  new  assets.  Countries  seeking  to  privatize  their  networks  have 
found  it  convenient  in  many  cases  to  only  privatize  pieces  of  it. 

The  past  twenty  years  has  seen  a  contrary  trend,  however.  In  this  interval,  there  has  been  a  grow¬ 
ing  fractionation  of  the  telephone  business  into  “retailers,”  those  who  sell  telecommunications 
services  directly  to  customers,  and  “wholesalers,”  those  who  sell  their  telecommunications  ser¬ 
vices  to  telecommunications  retailers. 

In  theory,  any  part  (or  all)  of  a  telephone  system  can  be  sold  at  wholesale  rates  to  a  retailer.  (A 
retailer  with  no  physical  plant  whatsoever  is  usually  called  a  “reseller”  while  one  that  does  is 
termed  “facilities-based.”)  Even  so,  and  even  though  more  of  that  will  be  seen  over  the  next  sev¬ 
eral  years,  the  most  important  form  of  wholesalers  at  this  time  are  the  “transport  providers”  who 
seU  fiber-optic  transmission  capacity  to  facilities-based  retailers.  This  trend  is  being  driven  by  a 
number  of  complementary  considerations,  including  the  following: 

1 .  High-quality  bandwidth  has  become  a  non-scarce  commodity,  unlike  twenty  years  ago  when 
only  a  few  had  the  resources  or  technical  knowledge  to  build  it. 

2.  The  rapid  improvement  in  fiber-based  technology  gives  a  substantial  cost  advantage  to  that 
transport  company  which  most  recently  installed  its  system. 

3.  Network  providers  (e.g.,  the  retailers)  are  now  willing  to  lease  transport  (from  wholesalers) 
rather  than  own  it. 

4.  It  is  possible  to  build  a  viable  business  proposition  as  a  transport  wholesaler  (e.g.,  FLAG, 
Atlantic  Crossing,  and  Project  Oxygen),  in  that  it  is  possible  to  borrow  money  to  fund  the 
business,  to  find  suppliers  (e.g..  Lucent)  to  sell  the  necessary  materials,  and  to  find  customers 
(e.g.,  the  retailers)  for  the  product  (e.g.,  bits). 

5 .  Some  entrants  into  the  wholesale  transport  business  have  an  economic  edge  over  others , 
including  some  incumbent  telephone  organizations,  since  they  serendipitously  own  right-of- 
way  on  which  fiber-optic  systems  can  be  installed.  Examples  include  power  transmission 
companies,  pipelines,  and  railroads  (e.g.,  ENRON,  'VWUiams,  and  Qwest,  respectively). 

What  are  the  implications  of  this  to  the  design  and  operation  of  telephone  networks?  There  are 
two  big  ones.  The  first  is  that  the  network  providers  (the  retailers)  will  shop  for  the  best  price 
available  among  many  possible,  essentially  equivalent  transport  suppliers.  An  immediate  corol¬ 
lary  is  that  each  retailer  may  change  vendors  whenever  a  cheaper  transport  alternative  is  found. 
The  converse  implication  is  that  the  transport  providers  will  manage  their  assets  aggressively  to 
minimize  their  own  costs,  allowing  them  to  keep  their  prices  low  and  therefore  to  maintain  then- 
customer  base. 

How  then  does  this  affect  the  defense  establishment?  The  fact  that  there  are  a  number  of  network 
providers  and  each  of  them  has  access  to  many  transport  vendors  suggests  the  availability  of  a 
highly  redundant  and  robust  communications  infrastructure,  one  capable  of  serving  many  users 
and  surviving  many  local  outages  or  failures.  From  the  opposite  perspective,  fliat  of  attempting  to 
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intercept  or  interdict  the  communications  of  an  adversary,  the  availability  of  a  highly  diverse,  and, 
in  fact,  time  varying  network  of  networks  makes  the  targeting  of  that  interception  very  difficult. 

The  Death  of  the  PTTs  and  the  Rise  of  the  RPOAs 

There  is  a  growing  international  realization  that  modem  telecommunications  are  needed  to  sup¬ 
port  economic  development.  Unfortunately,  the  wholesale  improvement  of  a  country's  telecom¬ 
munications  infrastructure  is  a  very  expensive  proposition.  This  has  led  to  the  decision  in  many 
cormtries  to  privatize  their  PITs  or,  at  the  least,  to  permit  some  degree  of  competition  with  the 
national  PIT,  both  serving  as  a  mechanism  to  attract  the  investment  capital  needed  to  buy  and 
install  new  equipment.  This  approach  has  been  very  effective  when  it  has  been  applied,  creating 
modem  telecommunications  capabilities  and  generally  lowering  the  prices  charged  to  consumers. 
It  has  also  had  the  effect  of  moving  the  world  away  from  the  old  model  of  government-owned, 
government-operated  telephone  systems  and  toward  a  new  one  in  which  national  and  transna¬ 
tional  companies  own  and  operate  significant  portions  of  national  systems.  Historically  these 
companies  have  been  the  exception  (e.g.,  the  Bell  System)  and  government  ownership  has  been 
the  rule.  In  international  regulatory  bodies  (e.g.,  the  CCITT  and  now  the  ITU-T)  the  private  com¬ 
panies  were  called  “recognized  private  operating  authorities  (RPOAs).”  The  ultimate  effect  of 
countries  resorting  to  the  free  market  to  build  or  rebuild  their  telephone  systems  is  that  the  PTTs 
will  fade  as  operators  over  time  and  that  national  and  transnational  RPOAs  will  grow  to  replace 
them. 

As  competition  with  the  PITs  is  permitted,  and  as  other  regulatory  barriers  are  lowered,  new  net¬ 
work  providers  will  form  at  the  international,  regional,  and  local  levels.  At  this  time  each  of  the 
aspirants  for  the  role  as  a  global  network  provider  has  at  least  one  American  corporate  partner. 

The  rise  of  global  RPOAs  will  lead  to  the  demise  of  the  practice  used  by  PITs  of  establishing 
long  term,  “direct”  transmission  routes  between  each  pair  of  countries.  Such  an  example  is  shown 
in  Figure  3.  In  this  case,  the  PTTs  of  Brazil  and  Egypt  have  contracted  with  INTELSAT  to  create 
a  direct  link  between  their  two  coimtries.  The  modem  model  is  shown  in  Figure  4,  In  this  case,  the 
national  telephony  systems,  whether  they  be  governmental  PTTs  or  private  organizations,  contract 
with  a  global  RPOA  to  haul  their  traffic.  In  some  cases,  the  national  organizations  might  lease  a 
dedicated  trunk  group  but  in  others  the  individual  telephone  calls  will  be  routed  through  the 
RPOA's  network  on  a  path  dependent  on  the  instantaneous  loading  of  the  network. 

The  emergence  of  these  RPOAs  further  exacerbates  the  issues  illuminated  earher  regarding  the 
retailing  and  wholesaling  of  telecommunications.  The  theoretical  reliability  and  survivability  of 
communications  between  any  two  points  will  improve  and  the  cost  will  fall  as  technology 
improves  and  competition  grows.  The  converse  is  also  true.  The  ability  to  predict  the  pathway  that 
an  adversary's  communications  will  take  becomes  that  much  harder  as  well. 
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Figure  4:  Putative  Example  of  a  Network  Built  by  a  Recognized  Private  Operating  Authority  (RPOA) 


The  Big  Picture 

So  what  does  the  future  hold? 

1 .  Companies ,  not  countries ,  will  be  running  the  world's  international  (and,  progressively, 
national)  telephone  networks. 

2.  There  will  be  less  and  less  stability  in  who  the  telecom  retailer  is  for  a  specific  user  and,  in 
turn,  less  stability  in  who  the  transport  provider  is  for  each  retailer  and  reseller. 

3 .  Packetized  voice,  particularly  over  “managed  netwcwks ,”  wUl  become  reality  at  the  national 
and  international  levels. 

From  the  perspective  of  national  defense,  these  trends  are  two-edged.  This  multilayered  redun¬ 
dancy  implied  by  the  increasing  commercial  marketplace  bodes  well  for  those  wanting  reliable 
communications,  while  complicating  flie  life  of  those  with  the  responsibility  for  intercepting  or 
interdicting  the  communications  of  adversaries. 
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The  broad  goals  of  this  research  are  to  develop  methods  for  using 
nonparametric,  statistical  tolerance  intervals  (1)  to  measure 
the  performance  of  signal  processing  algorithms  on  real  data  and 
(2)to  provide  data-adaptive  procedures  for  using  such  measures  to 
adaptively  control  the  performances  of  communication,  detection, 
classification,  localization,  and  tracking  systems. The  presentation 
is  concentrated  on  the  details  of  a  specific  example,  the  design  of 
nonparametric,  constant  false  alarm  rate  (CFAR) ,  data-adaptive 
detection  threshold. 

The  major  objective  is  to  develop  methods  for  designing  robust 
radar,  sonar  and  communication  systems  using  real  measured 
data.Our  approach  is  to  provide  the  means,  through  application  of 
tolerance  regions,  for  such  systems  to  quickly  recognize  the 
presence  of  a  new,  statistically  different  environment  and 
quickly  adapt  to  preserve  or  improve  performance.  Another 
objective  is  to  develop  methods  that  enable  one  to  assess  the 
performance  of  signal  processing  algorithms  on  real  data  so  that 
the  performances  of  different  methods  can  be  fairly  compared 
using  real  data  sets. 

Nonparametric,  statistical  tolerance  regions  provide  a  distribution 
free  measurement  of  the  range  of  experimental  outcomes  and  of 


the  uncertainity  of  the  observed  range.  Hence,  tolerance  regions 
have  the  potential  to  allow  one  to  measure,  assess,  and  control 
performance  of  signal  processing  algorithms  on  real  data. 

Tolerance  Regions,also  called  Tolerance  Intervals  (TIs),were 
defined  and  developed  over  a  productive  period  of  only  a  few 
years  in  the  1940’s  by  the  statisticians  Wilks,  Wald,  Tukey, 
Robbins,  and  Scheffe.TIs  were  not  applied  in  signal  processing 
until  recently.  Streit  and  Luginbuhl,in  their  paper  "Maximum 
Likelihood  Training  of  Probabilistic  Neural  Networks", IEEE 
Trans.  On  Neural  Networks,vol.5,pp.764-783,September  1994, 
used  TIs  and  Gaussian  mixtures  to  significantly  extend  the 
performance  and  design  methodology  of  PNN  classifiers. 

Real  and  Tufts,  motivated  by  the  above  paper ,  conversations 
with  Streit ,  and  study  of  the  original,  statistical  papers, 
derived  a  method  for  estimation  of  threshold  values  (or,in 
statistical  terminology,  quantiles)  for  signal  detection  and 
classification  systems  in  which  a  prescribed  value  of 
probability  of  false  alarm(PFA)  is  needed.  The  objective  of  Real 
and  Tufts  was  to  find ,  nonparametrically  ,  from  the  observed 
training  data,  a  maximum-likelihood  small-interval  estimate  of 
the  desired  threshold  (  or  quantile).  A  confidence-interval 
interpretation  can  be  made  of  the  ROC-curves  which 
result  from  this  design  procedure.Papers  by  Real  and  Tufts  can 
be  found  in  the  January, 1999,  IEEE  Signal  Processing  Letters  and 
in  the  Proceedings  of  the  1999  IEEE  ICASSP  Conference. 

What  are  Tolerance  Regions  (TRs)?  For  simplicity  let’s  discuss 
a  univariate  TR.  First,  a  TR  ,  consisting  of  a  set  of 
nonoverlapping  intervals,  is  a  random  region.  It  is  a  random 
region  because  the  endpoints  of  the  intervals  are  functions  of 
the  training  data.  Second,  these  functions  of  the  data  are 
special  and  utilize  order  statistics,  because  only  then  can  we 
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make  precise  probability  statements  about  the  underlying, 
unknown  population.  For  example  we  can  specify  a  probability, 
the  "confidence  level",  that  the  coverage  of  the  population 
by  the  TR  is  at  least  a  specified  value,  called  the  "tolerance 
proportion". 

What  are  some  other  applications  of  TRs  ?  Real,  Yannone,  and 
Tufts,  in  their  paper  "Comparison  of  Two  Methods  for 
Multispectral  3-D  Detection  of  Single  Pixel  Features  in  Strong 
Textured  Clutter",  Conf.  Proc,  IEEE  IMDSP98,  July,  1998,  show 
how  TRs  can  be  used  to  compute  and  compare  the  ROC-curve 
performances  of  two  different  detection  methods  using  only  real, 
clutter-filled,  image  data.  Qi  Li  and  D.  W.  Tufts,  in  their 
paper  "Principal  Feature  Classification",  IEEE  Trans  Neural 
Networks  8,  pp.  155-160  (Jan.  1997),  show  how  to  prune  a  large 
set  of  proposed  features  to  obtain  a  smaller,  but  very 
effective,  subset  of  features.This  method  can  be  improved  by 
using  tolerance  regions  to  control  the  probability  of 
misclassification.. 

In  summary,  application  of  tolerance-region  concepts  in  the 
design  of  adaptive  systems  can  provide  performance  which  can  be 
reliably  controlled  and  predicted,  based  on  real,  measured  data. 
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In  this  paper,  we  investigate  the  use  of  the  sequences  in  the  Prometheus  orthonor¬ 
mal  set  (PONS)  for  application  as  codewords  in  orthogonal  frequency  division  mul¬ 
tiplexing  (OFDM)  communication  systems.  The  energy  spreading  properties  of 
the  PONS  sequences  are  well  suited  to  addressing  the  problem  of  controlling  the 
peak-to-mean  envelope  power  ratio  (PMEPR)  for  OFDM  systems.  It  is  shown  that 
the  matrix  consisting  of  the  rows  of  the  PONS  matrix  of  order  2"*,  together  with 
their  antipodal  counterparts,  can  be  identified  with  a  coset  of  the  first-order  Reed- 
MuUer  code  RM(1,  m)  inside  the  second-order  code  RM(2,  m),  thereby  establish¬ 
ing  a  coimection  between  the  PONS  (and  hence  Shapiro)  sequences,  and  classical 
error-correcting  codes. 

The  PMEPR  values  obtained  for  codewords  in  the  PONS  sets  are,  2is  may  be  ex¬ 
pected,  similar  to  those  attained  by  the  constmction  based  on  general  Golay  comple¬ 
mentary  pairs  recently  proposed  by  Davis  and  Jedwab.  While  the  Golay  construction 
generates  many  more  codewords  of  a  given  length  than  do  the  PONS  sequences,  the 
PONS  construction  provides  a  complete  orthogonal  set  of  sequences  which  consist 
of  Golay  pairs.  Generalization  of  the  (classical)  PONS  construction  can  3deld  other 
(non-orthogonal)  families. 
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I.  INTRODUCTION 

In  orthogonal  frequency  division  multiplexing  (OFDM)  modulation  schemes,  data  is 
transmitted  simultaneously  over  multiple  equally  spaced  carrier  frequencies,  using  fast 
Fourier  transform  (FFT)  processing  for  modulation  and  demodulation.  OFDM  offers  many 
advantages  for  transmission  at  high  data  rates  over  time-dispersive,  fading  and  multipath 
channels  at  low  signal-to-noise  ratios  [1].  The  method  has  been  proposed  for  digital  audio 
and  video  broadcasting,  with  the  IEEE  802.11  Draft  Standard  making  use  of  OFDM  for 
wireless  local  area  networks  (LANs). 

The  principal  difficulty  with  OFDM  is  the  high  peak-to-mean  power  ratio  of  uncoded 
OFDM  signals.  That  is,  when  sinusoidal  signals  of  n  carriers  add  constructively,  the  peak 
envelope  of  the  transmitted  power  is  large — as  high  as  n  times  the  mean  envelope  power. 
The  requirement  for  large  peak  transmitter  power  introduces  a  host  of  practical  difficulties, 
particularly  in  mobile  applications,  where  battery  power  is  a  constraint.  Moreover,  regu¬ 
latory  limits  on  peak  power  reduces  the  effective  range  of  OFDM  transmissions,  and  may 
require  power  amplifiers  to  operate  in  regions  where  power  is  converted  inefficiently. 

An  idea  which  has  emerged  over  the  past  few  years  is  to  use  block  coding  to  transmit 
across  the  n  carriers  only  those  binary  sequences  which  lead  to  small  peak-to-mean  enve¬ 
lope  power  ratio  (PMEPR).  The  first  such  approaches  used  exhaustive  searches  to  identify 
the  best  sequences  in  terms  of  small  PMEPR,  and  required  large  lookup  tables  for  encoding 
and  decoding. 

Recently,  Davis  and  Jedwab  [5],  [6]  announced  a  previously  unrecognised  connection 
between  Golay  complementary  sequences,  whose  good  PMEPR  properties  had  long  been 
recognised  [2],  [10],  and  second-order  Reed-Muller  codes,  with  good  error  correction 
properties  and  efficient  algorithms  for  encoding  and  decoding.  The  essence  of  [5],  [6] 
(see  also  [9])  is  to  allow  transmission  across  the  carriers  only  those  il  sequences  belong¬ 
ing  to  a  Golay  complementary  pair.  While  it  had  been  known  since  work  of  Boyd  [2]  and 
Popovid  [10]  that  the  use  of  Golay  sequences  as  codewords  to  control  the  modulation  of 
carrier  signals  results  in  OFDM  with  PMEPR  at  most  3  dB,  the  key  contribution  of  Davis 
and  Jedwab  was  to  establish  that  in  addition  to  controlled  PMEPR  properties,  Golay  se¬ 
quences  also  possess  sufficient  intrinsic  structure  to  form  a  practical  error-correction  code. 

In  an  independent  line  of  research,  Byrnes  [3]  constructed  a  sequence  of  polynomials 
with  coefficients  il,  each  of  which  has  a  PMEPR  at  most  2  or,  equivalently,  a  crest  factor 
of  ^2.  The  coefficients  in  the  polynomials  occur  as  the  rows  of  a  2"*  >?  2™  matrix  and  are 
all  orthogonal  (a  Hadamard  matrix)  and,  when  suitably  normalised,  form  a  complete  or¬ 
thonormal  basis  for  the  space  (0, 27r)  while  preserving  the  flatness  and  unimodularity  of 
the  Shapiro  polynomials  [12];  see  also  [11].  In  fact  the  rows  occur  as  Golay  complemen¬ 
tary  pairs.  In  view  of  the  “energy  spreading”  properties  of  this  sequence  of  polynomials, 
which  we  hereafter  term  the  Prometheus  orthonormal  set  (PONS)  [4],  it  is  natural  to  ask 
what  potential  apphcation  they  might  find  in  OFDM-based  communications  systems. 

In  this  paper,  we  show  that  with  P  the  PONS  matrix  of  order  2"*,  the  2"*+^  x  2™  matrix 
formed  from  the  rows  of  P  and  its  antipodal  counterpart  — P,  can  be  identified  with  a  sub¬ 
code  of  the  second-order  Reed-Muller  code  RM(2,  m)  under  the  mapping  ui  (— 1)“L 
Moreover,  we  show  that  the  PONS  sequences  and  their  antipodal  counterparts  can  be  gen¬ 
erated  as  a  coset  of  the  first-order  Reed-Muller  code  RM(1,  m)  with  the  classical  Shapiro 
sequence  used  as  a  coset  representative.  Other  properties  of  PONS  notwithstanding,  the 
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PONS  sequences  therefore  amount  to  the  first-order  Reed-Muller  code  RM(1,  m)  with  an 
additive  offset. 

PONS  is  therefore  one  of  the  cosets  identified  by  Davis  and  Jedwab  as  being  suitable 
for  application  in  OFDM.  The  more  general  Golay  construction  of  Davis,  Jedwab  and 
Paterson  generates  many  more  codewords  of  a  given  length  than  a  basic  PONS  sequence. 
However  the  PONS  sequences  of  a  given  length  form  a  maximal  orthogonal  set  of  such 
sequences.  The  PONS  construction  has  some  flexibility  and  so  can  generate  many  such 
maximal  orthogonal  sets  of  a  given  length.  It  would  be  of  interest  to  know  whether  all  of 
the  codes  of  Davis,  Jedwab  and  Paterson  can  be  generated  by  the  PONS  construction. 


2.  GOEAY  SEQUENCES  AND  THE  PONS  CONSTRUCTION 

Let  a  =  (hojOi,..  and  6  =  (bo,bi,...,bn-i),  where  at,bi  G  Z2.  The  aperi¬ 
odic  autocorrelation  of  a  at  displacement  I  is  (7a(^)  =  Yji  ,  where  the  summation 

is  understood  to  be  over  only  those  integer  values  for  which  both  i  and  i  -F  ?  lie  within 
{0, 1, .  •  • ,  n  -  1},  and  where  w  =  The  sequences  a  and  b  are  called  a  Golay  com¬ 

plementary  pair  over  Z2  if  Ua(?)  +  Gb{l)  =  0  for  each  ?  7^  0.  Any  sequence  which  is  a 
member  of  a  Golay  complementary  pair  is  called  a  Golay  sequence.  With  a  slight  abuse  of 
notation,  we  will  also  refer  to  il  sequences  a°  and  as  forming  a  Golay  complementary 
pair,  by  which  is  meant  that 

A(a«)(?)-FA(a^)(?)  =  0,  1^0,  (1) 


where  the  aperiodic  autocorrelation  function  A(s)(i)  of  a  =  [hq  ai . . .  a„_i]  is  defined: 


A(a)(£)  =  { 


En—l—l 

i=0 

En-f-?— 1 

5=0 
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0<l  <n, 
-n  <l  <0, 
otherwise. 


(2) 


The  PONS  construction  of  il  matrices  of  order  2"*,  denoted  P2'",  is  presented  in  [3], 
and  derives  from  the  idea  of  the  Shapiro  transform  of  a  unimodular  sequence  [12].  The 
presentation  here  is  an  inductive  method  based  on  a  matrix  concatenation  rule  presented 
by  Byrnes  [4].  Starting  with  the  matrix 


the  concatenation  rule 


A 

IB 


B' 
-B 
A  ’ 
A 


(3) 


is  applied,  where  A  and  B  are  two  consecutive  matrix  rows.  Thus  the  rule  (3)  means  that 
the  first  row  of  P4  is  the  concatenation  of  A  =  [1  1]  and  B  =  [1  -  1];  the  second  row 
of  P4  is  the  concatenation  of  A  and  — B,  and  so  on.  To  obtain  the  matrix  Pg,  we  first  take 
the  pair  A,  B  to  be  the  first  two  rows  of  P4,  and  use  the  concatenation  rule  (3)  to  obtain 
the  first  four  rows  of  Pg.  Then  we  take  the  pair  A,  B  to  be  the  next  two  rows  of  P4,  and 
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use  the  rule  (3)  to  obtain  the  next  four  rows  of  Pg,  and  so  on.  The  sequence  of  matrices 
consttucted  in  this  way  will  be  termed  the  original  PONS  sequence  to  distinguish  them 
from  more  general  constructions  [4]. 


3.  POLYPHASE  PONS 

The  PONS  construction  is  capable  of  extension  by  the  replacement  of  -1  by  higher 
roots  of  unity.  This  provides  codes  with  similar  energy  spreading  properties  to  the  i  1 
construction  but  now  with  values  in  the  powers  of  a  root »  of  1.  We  illustrate  the  idea  with 
the  case  when  is  a  cube  root  of  1,  say  w  =  This  time  we  start  with  the  matrix 


P2 


’1  1  1' 

- 1  a  ■  , 

1  0 


and  the  concatenation  rule  is 


'A 

B 

U' 

A 

wB 

A 

oftB 

tuC 

'A' 

B 

0 

A 

B 

B 

wC 

aft  A 

U 

B 

aA 

U 

A 

B 

-r 

laA 

w^B  - 

u 

iv^A 

wB 

(4) 


This  rule  is  applied  to  three  consecutive  matrix  rows  A,  B  and  (7.  The  resulting  codes 
are  of  length  3"  and  3"  such  codes  are  produced  which  are  orthogonal,  thus  producing  a 
maximal  (i.e.  complete)  orthogonal  set.  The  aperiodic  autocorrelations  A{s)  for  these  (as 
defined  in  (2) )  satisfy  an  equation  like  (1)  but  with  three  terms.  This  gives  an  upper  bound 
for  their  PMEPR  of  3.  Of  special  interest  are  the  codes  corresponding  to  ca  =  i  which  are 
of  length  4”. 


>l(a3*+i)(^)  +  =  0,  I  0,  (5) 

for  any  fc  =  0, 1, . . . ,  3”“^  where  the  length  of  sP  is  3". 

This  polyphase  construction,  together  with  the  property  of  the  aperiodic  autocorrelations 
in  equation  (5),  identifies  triples  of  octary  codewords  with  special  correlation  properties, 
and  explains  the  fine  structure  behaviour  of  second-order  octary  cosets  observed  by  Pater¬ 
son  [9]. 


4.  OFDM  TRANSMISSION 

An  n-carrier  OFDM  signal  is  composed  by  adding  together  n  equally  spaced,  phase- 
shifted  sinusoidal  carriers.  Information  is  carried  in  the  phase  shift  applied  to  each  carrier. 
If  M  distinct,  equally-spaced  phase  shifts  are  used,  then  we  say  that  the  OFDM  system  uses 
M-ary  phase-shift  keying,  or  M-PSK modulation.  With  carrier  frequencies  Jo  +  j'fs,0^ 
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j  <  n,  the  OFDM  signal  may  be  represented  as  the  real  part  of  the  complex-valued  function 

n— 1 

5(a)(f)  =  (6) 

J=o 

where  the  information-bearing  sequence  a  =  (uo, Hi, . . .  On-i),  aj  G  7m,  is  called  an 
OFDM  codeword  and  w  —  ^  complex  M-th  root  of  unity.  This  signal  is  trans¬ 

mitted  for  a  length  of  time  equal  to  l/7s.  called  the  symbol  period.  In  practical  systems, 
M  is  a  power  of  2.  For  M  =  2,  we  have  binary  OFDM  codewords  and  binary  or  BPSK 
modulation,  and  the  present  paper  is  restricted  exclusively  to  this  case. 

We  define  the  instantaneous  envelope  power  of  the  OFDM  signal  to  be  the  function 
P(a)(f)  =  |S(a)(f)p,  and  is  an  upper  bound  for  the  actual  power  Re(5(a)(f))^  of  the 
OFDM  signal.  It  is  straightforward  to  show  that 

P(a)(<)  =  C7(a)(?)e2-^^-* 

Z=l— n 

n— 1 

=  C7(a)(0)  -F  2  •  Re  C7(a)(^)e2’^^^•^ 

?=i 

where  C'(a)(£)  is  the  aperiodic  auto-correlation  function  of  the  codeword  a.  From  this  last 
expression,  we  see  that  the  time-averaged  envelope  power  of  5(a)  (f)  is  equal  to  n,  and  so 
the  peak-to-mean  envelope  power  ratio  (PMEPR)  of  the  signal  is  defined  to  be 

-  sup  P(a)(f).  (7) 

n  o<i<i 

A  key  idea  from  the  work  of  Boyd  [2]  and  later  Popovid  [10]  is  to  consider  codewords 
that  are  Golay  complementary  sequences.  Suppose  a°  and  a^  are  a  Golay  complementary 
pair  of  length-n  vectors  whose  values  are  drawn  from  SI.  Then  we  have: 

n— 1 

1=1— n 

=  C7(a“)(0)-1-C7(ai)(0) 

=  2n, 


and  hence 

0  ^  P(a^')(f)  ^  2n,  i  =  0,l. 

Thus  the  PMEPR  associated  with  a  multi-carrier  signal  modulated  by  a  codeword  a  from 
a  Golay  complementary  pair  is  at  most  2,  i.e.  3  dB.  Since  each  PONS  sequence  is  a  Golay 
sequence,  it  immediately  follows  that  PONS-based  OFDM  transmission  systems  guarantee 
a  PMEPR  of  at  most  3  dB. 


5.  REED-MUEEER  CODES 

We  consider  Z2-valued  sequences  of  length  2™.  Let  scq  be  the  aH-l’s  sequence.  For 
i  =  1, 2, . . .  ,m,  let  be  2®“^  concatenated  copies  of  the  sequence  comprising  2”*“®  O’s 
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followed  by  I’s.  Then  ..,Xm  fonn  the  rows  of  a  generator  matrix  for  the 

first-order  Reed-Muller  code  RM(l,m).  These  sequences,  together  with  the  componen¬ 
twise  products  XiXj  for  1  <  i  <  j  <  m  form  the  rows  of  a  generator  matrix  for  the 
second-order  Reed-Muller  code  RM(2,  m). 

A  generator  matrix  G  produces  a  code  C  in  the  sense  that  Zs-linear  combinations  of  the 
rows  of  U  yields  a  set  of  length-n  Z2-valued  vectors,  called  codewords.  By  a  coset  of  C, 
we  mean  a  set  of  the  form  a  +  C  where  a  is  some  fixed  vector,  the  coset  representative, 
over  Z2. 

Expressed  in  terms  of  Boolean  functions,  the  general  r-th  order  binary  Reed-Muller 
code  RM(r,  m)  of  length  2"*  is  defined  to  be  the  binary  code  whose  codewords  are  the 
vectors  identified  with  the  Boolean  functions  of  degree  at  most  t  in  ■  •  -itCm-i. 

The  code  RM(r,m)  is  linear,  and  has  minimum  Hamming  distance  2”*“'’  [8].  The  bi¬ 
nary  Reed-Muller  codes  were  first  presented  in  1954,  and  while  they  are  arguably  one  of 
the  best  understood  families  of  codes,  it  is  only  recently  that  the  connection  with  Golay 
complementary  sequences  was  established.  The  central  result  of  [5],  [6]  is: 

Theorem  5.1.  The  codeword 

m— 1  m 

^  ]  ^w(?)^7r(i-(-l)  "t"  y  ]  CjtCi  (8) 

5=1  5=0 

is  a  binary  Golay  sequence  of  length  2”*  for  any  permutation  u  of  {1,2, ,  m}  and 
for  any  coefficients  c*  €  {0, 1}. 


The  second  term  in  equation  (8)  produces  the  codewords  of  the  first-order  Reed- 
Muller  code  RM(1,  m),  while  the  first  term,  inteipreted  as  coset  representative,  generates 
m!/2  Boolean  functions,  each  of  which  is  identified  with  a  codeword  from  RM(2,m). 
There  are  only  m!/2  rather  than  m!  such  terms,  since  the  expression  $^^7^  x„(j)a:,r(5-(-i) 
is  invariant  under  the  mapping  tt  f?  tt',  where  7r'{k)  =  7r(m  +  l-k). 

Equation  (8)  therefore  determines  2"*m!  binary  Golay  sequences  of  length  2*”,  rep¬ 
resented  as  m!/2  distinct  cosets  of  RM(l,m),  each  containing  2^^'^  codewords.  The 
existence  of  at  least  this  many  length-2'"  binary  Golay  sequences  was  noted  in  [7].  The 
code  consisting  of  all  sequences  identified  in  Theorem  5. 1  is  a  subcode  of  RM(2,  m)  and 
therefore  has  a  minimum  distance  of  at  least  2’"“^. 

While  the  details  of  the  encoding  and  decoding  scheme  are  discussed  at  length  in  refer¬ 
ences  [6],  [9],  the  essence  can  be  conveyed  with  a  simple  example  from  [5].  For  m  =  3 
there  are  three  choices  of  coset  representative,  namely  3:112  +  X2tcz  =  00010010,  X1X3  -I- 
2:2X3  =  00010100,  and  X1X2  -I-  XiX3  =  00000110.  One  of  two  coset  representatives  (say 
the  first  two)  is  selected  according  to  the  value  of  one  data  bit,  and  to  this  is  added  the  en¬ 
coded  value  cjXj  of  four  further  data  bits  (ci ,  C2 ,  C3 ,  C4)  to  produce  an  8-bit  transmitted 
codeword  with  PMEPR  at  most  3  dB. 

For  m  =  4,  Table  1  in  [6]  explicitly  lists  the  m!/2  =  12  coset  representatives  generated 
according  to  (8),  representing  a  total  of  2^4!  =  384  length-16  binary  sequences.  By  inspec¬ 
tion,  the  first  coset  representative  in  [6,  Table  1]  is  equal  (after  mapping  0  f?  1, 1  -1) 

to  the  first  row  of  the  original  PONS  matrix  of  order  16  or,  equivalently,  the  first  16  terms 
of  the  classical  Shapiro  sequence.  Exhaustive  calculation  with  this  example  shows  that  the 
whole  coset  of  RM(1, 4)  generated  by  this  particular  coset  representative  according  to  (8) 
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produces  all  16  rows  of  the  original  PONS  matrix  of  order  16,  together  with  their  antipodal 
counterparts. 

Theorem  5.2.  The  mutrix  consisting  of  the  rows  of  the  original  TOWS  matrix  of 
order  2^,  together  with  their  antipodal  counterparts,  can  be  identified  with  a  coset  of 
the  first-order  Teed-Muller  code  7ZM(l,m)  inside  the  second-order  code  'KM{2,m). 
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Increasingly,  systems  axe  being  designed  to  account  for  impulsive  be¬ 
haviour  that  may  be  present  in  signals,  with  one  of  the  prominent  statistical 
models  used  being  the  a  stable  distribution.  Two  techniques  are  presented 
that  test  for  the  level  of  impulsive  behaviour,  specifically  by  testing  the  pa¬ 
rameter  a.  The  bootstrap  is  used  in  both  cases  to  approximate  the  distri¬ 
bution  of  the  test  statistics  and  in  the  setting  of  critical  values.  Simulation 
results  show  that  both  tests  are  able  to  distinguish  between  non-impulsive 
(Gaussian)  and  impulsive  (non-Gaussian)  a  stable  distributions. 


Key  Words:  impulsive  interference,  alpha-stable  distribution,  parametric  bootstrap,  char¬ 
acteristic  function,  goodness-of-fit  tests 


1.  INTRODUCTION 

Statistical  models  that  incorporate  impulsive  behaviour  have  found  use  in  the 
analysis  of  atmospheric  communication  channels,  underwater  acoustic  signals,  radar 
systems,  economic  time  series  and  biomedical  signals  [7].  The  alpha  stable  (qS) 
distribution  has  been  prominently  used  in  many  of  these  cases.  This  may  be  due  to 
physical  reasons  -  interference  from  spatially  Poisson  distributed  scatterers  is  qS 
distributed  -  or  merely  due  to  its  general  form  -  it  is  a  broad  family  of  distributions. 

The  aS  distribution  is  a  four  parameter  distribution  defined  in  terms  of  its  char¬ 
acteristic  function 


where 


=  exp{j6t-  I  ct  1“  [1  -  j/3sgn(t)a;(t,a)]} 


tan(Q:7r/2)  , 
-(2/7r)log|t| 


a  7^  1 
a  =  1 
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and  sgn(i)  is  the  signum  function  [3].  Herein,  a,  0  <  a  <  2,  is  the  characteristic 
exponent,  yd,  — 1  <  /d  <  1,  the  skewness  parameter,  c,  0  <  c,  the  scale  parameter 
and  5,  —oo  <  S  <  oo  the  location  parameter. 

The  characteristic  exponent,  a,  is  a  measure  of  how  impulsive  the  distribution 
is.  The  smaller  a,  the  more  impulsive  is  the  aS  distributed  process,  that  is,  the 
more  outliers  occur  in  an  observed  series.  When  a  =  2,  its  maximum  value,  the  aS 
distribution  is  equivalent  to  the  Gaussian  distribution. 

The  degree  of  impulsiveness  is  an  important  feature  in  many  applications.  For 
example,  in  a  radar  application  a  detector  designed  for  Gaussian  interference  may  be 
used  if  a  =  2,  rather  than  a  more  complicated  detector  for  signals  in  aS  interference. 

To  test  impulsive  behaviour  we  suggest  a  test  for  a.  The  procedures  presented 
are  applicable  in  principle  to  all  four  parameters,  however  we  will  focus  on  test¬ 
ing  a  only,  and,  more  specifically,  on  the  special  case  of  testing  for  the  Gaussian 
distribution  {a  =  2)  against  non-Gaussian,  stable  distributions. 

Throughout  this  paper  we  have  used  the  parameter  estimation  procedure  sug¬ 
gested  by  Koutrouvelis  in  [6].  By  manipulation  of  the  characteristic  function  of 
the  oS  distribution  it  was  shown  that  estimators  of  the  parameters  could  be  found 
through  a  regression  technique.  The  estimators  are  consistent  and  asymptotically 
unbiased.  Tabulated  values  of  the  MSE  (and  a  number  of  other  statistical  quanti¬ 
ties)  of  the  estimators  were  presented  for  some  parameter  settings.  These  quantities 
were  derived  through  Monte  Carlo  simulations  in  [6].  Expressions  for  the  asymp¬ 
totic  properties  of  the  estimators  are  unavailable. 

In  the  following  section  we  discuss  the  hypothesis  testing  problem  for  impulsive 
behaviour  using  bootstrap  methods.  Following  this,  a  test  procedure  is  derived 
in  section  3  that  tests  for  the  value  of  a.  An  alternative  approach,  using  the 
characteristic  function  is  presented  in  section  4.  In  section  5,  simulation  results  are 
presented  and  conclusions  are  drawn  in  section  6. 

2.  HYPOTHESIS  TESTING  WITH  THE  BOOTSTRAP 

The  bootstrap  is  a  simple  automatic  procedure  that  can  take  the  place  of  analytic 
analysis.  It  can  be  used  to  estimate  the  sample  distribution  of  statistics  when 
standard  methods  cannot  be  applied.  Observations  are  randomly  resampled  and 
the  statistics  re-computed  -  mimicking  the  process  of  repeating  the  experiment. 
When  this  is  done  a  large  number  of  times,  the  distribution  of  the  re-computed 
values  approximates  the  distribution  of  the  statistic.  The  principle  of  hypothesis 
testing  using  the  bootstrap  is  discussed  in  [4,  9]. 

Consider  the  aS  family  of  distributions,  indexed  by  the  parameter  vector  p  = 
[a  /3  c  e  V,  F[p;pg-p}.  Further,  define  two  disjoint  subsets  of  this  family 
=  {Fpo}  and  These  two  sets  are  indexed  by  parameter  sets  Vq 

and  Vi  which  span  the  parameter  space  V,  that  is,  'Po  U  "Pi  —V  and  Po  fl  Pi  =  0 

Let  the  observations  X  =  Xi, X2, . . . , be  independent  and  identically  dis¬ 
tributed  having  distribution  Fp.  We  wish  to  test  the  hypothesis 

H:  Fp  E  Po  versus  K:  Fp  E  Pi 
or  alternatively,  defined  in  terms  of  the  parameter  space 
H:  p  E  Po  versus  K:  p  E  Pi 
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If  we  can  assume  X  is  symmetrically  distributed  about  the  origin,  that  is,  /?  =  0 
and  <5  =  0,  and  we  assume  unit  scale,  c  =  1,  the  hypothesis  becomes  one  of  testing 
the  parameter  a 


H:  O'  =  ao  versus  K;  a  7^  oo 

To  test  the  hypothesis,  we  define  a  test  statistic  T(X)  =  T  with  distribution 
Kt,p{x)  =  Pa{T  <  a;}  where  Pp  is  the  probability  corresponding  to  Pp.  We  can 
now  find  the  bootstrap  estimator  of  the  critical  value  q,  q  =  Pt.oo  "O'))  where  7  is 
the  desired  level  of  significance.  The  distribution  of  the  test  statistics  of  independent 
bootstrap  resamples,  T*,  approximates  the  distribution  of  T.  Consequently,  q  can 
be  approximated  by  an  order  statistic  of  T*. 

Under  some  assumptions  on  the  distribution  of  T  under  H  and  K,  it  has  been 

K 

proven  that  the  bootstrap  test  T(X)  ^  q  is  asymptotically  correct  and  consistently 

H 

uniform  in  p  [8]. 


3.  TESTING  THE  a  PARAMETER 

The  test  for  a  suggests  the  use  of  the  test  statistic 

^  _  d  -  ao 

-‘a  —  A 

where  a  is  an  estimate  of  a  derived  from  the  observations  and  is  an  estimate  of  its 
standard  deviation.  This  quantity  is  an  approximate  pivot,  meaning  its  distribution 
is  approximately  independent  of  any  unknowns. 

In  Table  1,  we  present  the  proposed  bootstrap  based  procedure  for  testing  the  a 
parameter.  We  also  show  the  technique  in  block  diagram  form  in  Fig  1.  Thick  ar¬ 
rows  indicate  the  bootstrap  replications  generated  through  the  aS  random  number 
generator  (RNG).  Estimates  of  a&,  the  variance  of  the  estimate  d  and  ,  the  vari¬ 
ance  of  the  bootstrapped  estimates  d*  are  obtained  using  nested  bootstrap  stages. 
Details  can  be  found  in  [11]. 


Parameter 

^Estimation 


Ho 

or 

Hi 


FIG.  1.  Block  diagram  of  the  bootstrap  test  for  a. 
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Step  1.  Parameter  estimation.  Find  the  parameter  estimate  a  from  the  sample  X. 

Step  2.  Parametric  resampling.  Using  a  pseudo-random  number  generator,  generate  a 
random  sample  X*  of  the  same  size  as  X,  from  an  aS  distribution  with  parameter  a. 

Step  3.  Calculation  of  the  bootstrap  statistic.  From  X*,  calculate  T* 

Step  4.  Repetition.  Repeat  steps  2  and  3  many  times  to  obtain  a  total  of  B  bootstrap 
statistics  Tj ,  T2  , . . . ,  and  the  test  statistic  T. 

Step  5.  Ranking.  Rank  the  collection  Ti*,T2,...,Tb  into  increasing  order  to  obtain 

step  6.  Test.  A  bootstrap  test  has  then  the  following  form:  reject  H  if  T  <  T^c)^  where 
the  choice  of  C  determines  the  level  of  significance  of  the  test  and  is  given  by  C  =  [7(5 -fl)] , 
where  7  is  the  nominal  level  of  significance  [4]. 

TABLE  1 

The  bootstrap  principle  for  testing  the  hypothesis  H  :  a  =  ao 

against  K  :  a  <  ao* 


Importantly,  the  distribution  of  Ta  is  approximated  by  the  distribution  of 


and  not  by  the  distribution  of  or  of  a*  -  d.  This  has  been  shown  to  keep 

the  actual  level  of  significance  closer  to  the  nominal  level  [4,  11],  The  bootstrap 
distribution  of  T*  =  approximates  the  distribution  of  Ta  =  better 

under  H  than  the  distributions  of  d*  —  d  approximates  the  distribution  of  d  —  ao 
in  terms  of  finding  the  scaling  or  dispersion  of  T. 

4.  CHARACTERISTIC  FUNCTION  BASED  TEST 

Differences  between  characteristic  functions  have  been  used  extensively  to  test 
for  changes  in  distributions.  This  is  especially  so  for  Gaussianity  testing,  where  the 
technique  originated  [2,  5].  However,  it  was  frequently  noted  that  one  of  the  major 
advantages  of  characteristic  function  (cf)  based  goodness-of-fit  testing  was  that  it 
could  be  adapted  to  test  for  almost  any  distribution,  as  long  as  the  cf  is  specified. 
This  lead  to  its  recent  application  to  testing  for  the  aS  distribution,  against  all 
other  distributions,  in  [1]. 

Here,  a  parametric  form  of  the  cf  based  tests  is  formed  to  accommodate  the 
additional  knowledge  /  assumption  that  we  are  operating  within  a  defined  family 
of  cfs.  A  parametric  estimate  of  the  cf  of  the  generating  process  is  compared  to  the 
cf  of  the  distribution  under  H,  po),  rather  than  a  nonparametric  estimate,  such 
as  the  empirical  characteristic  function.  This  parametric  estimate  is  found  by  using 
the  estimated  parameter  values  and  the  known  form  of  the  cf  of  aS  distributions 
and  is  denoted  (f>{t,p). 

In  [10]  it  was  found  that  the  peak  absolute  difference  between  two  cfs  provided  a 
good  measure  of  the  distance  between  the  two  distributions.  Drawing  on  this,  we 
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define  our  test  statistic  to  be 

=  max  p)  -  4>{t,  po)| 

The  parameter  vector,  p  =  [a  ^  c  S]^,  has  all  four  parameters  of  the  distribu¬ 
tion.  This  highlights  an  advantage  of  this  test  statistic  over  the  statistic  defined  in 
section  3,  namely,  it  incorporates  all  parameters  into  the  test.  This  allows  its  use 
in  a  broader  range  of  problems. 

The  distribution  of  is  complicated  and  unknown,  we  again  draw  on  the  para¬ 
metric  bootstrap  to  determine  critical  values.  We  will  approximate  this  distribution 
by 

T*  =mp:|(/)(t,p*)  -(/>(t,p)|. 

The  procedure  is  presented  in  Fig  2. 


FIG.  2.  Block  diagram  of  the  characteristic  function  based  test. 


5,  SIMULATION  RESULTS  AND  DISCUSSION 

A  simulation  study  was  undertaken  to  determine  the  performance  of  the  two 
tests.  Here  we  consider  ao  =  2,  that  is,  we  are  testing  the  hypothesis  that  the 
observations  are  Gaussian  distributed  against  non-Gaussian  aS  distributed.  This 
is  probably  the  most  important  case  to  consider  as  it  tests  if  the  observations  have 
bounded  or  infinite  variance. 

Rejection  rates  for  a  number  of  values  of  a  are  presented  in  Tables  2  and  3  for 
observation  sample  sizes  of  200  and  400  respectively.  The  nominal  significance  level 
was  set  at  10%,  the  number  of  bootstrap  replications  was  300  and  25  replications 
were  used  for  the  bootstrap  variance  estimator  in  the  evaluation  of  the  test  statistics. 

TABLE  2 


Rejection  rates  (in  %)  for  sequence  lengths  of  200  based  on  300  replications. 


a 

1.7 

1.8 

1.9 

1.95 

2 

Ta 

97.0 

88.0 

56.3 

27.0 

7.7 

93.3 

73.3 

37.7 

11.0 

2.3 

Inspection  of  the  results  show  that  directly  testing  a  through  the  statistic 
yielded  higher  rejection  rates  than  the  cf  based  technique,  T^.  However,  it  should 
be  remembered  that  the  cf  based  test  is  simpler  to  adapt  to  the  case  were  more 
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TABLE  3 

Rejection  rates  (in  %)  for  sequence  lengths  of  400  based  on  300  replications. 


a 

1.7 

1.8 

1.9 

1.95 

2 

Ta 

100 

99.3 

76.3 

50.3 

6.3 

100 

95.3 

61.3 

29.0 

0.3 

than  one  parameter  is  to  be  tested.  The  performance  of  the  cf  based  test  may  be 
improved  through  the  use  of  a  pivotal  statistic  based  on  ^(t,  p). 

As  expected,  rejection  rates  decrease  as  a  approaches  ao  =  2  and  when  fewer 
observations  are  available.  The  achieved  level  is  well  below  the  nominal  level, 
especially  for  the  un-standardised  T^. 

6.  CONCLUSIONS 

Two  tests  have  been  presented  for  testing  the  parameter  values  of  an  aS  distribu¬ 
tion.  The  bootstrap  procedures  implemented  have  been  shown  to  allow  the  appro¬ 
priate  setting  of  critical  values  for  the  test  that  have  maintained  the  nominal  level 
of  significance.  Although  testing  the  a  parameter  directly  yielded  a  more  powerful 
test,  it  is  to  be  noted  that  the  characteristic  function  based  procedure  has  a  high 
degree  of  flexibility.  Simulation  results  show  that  both  reject  the  non-Gaussian  al¬ 
ternatives  tested  with  rates  varying  depending  on  the  degree  of  impulsive  behaviour 
and  the  number  of  observations  available. 
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